3 min read

Hadoop Client Environment (hdenv)

🏷️ #distributed   #hadoop   #inactive   #map_reduce   #nix   #oss   #project   #shell  
Table of Contents

Type: OSS | Status: Inactive

Hadoop Environment (hdenv) was a bash shell helper script for Hadoop that made switching between clusters and executing common commands easier.

http://code.google.com/p/hadoopenv/ 

The project was a set of utilities meant to make working with multiple remote Hadoop clusters from the command line easier. It relied primarily on features of later versions (3.1+) of the bash shell environment.

It allowed you to interact with a remote hadoop cluster - without ssh’ing to the primary name node and using its hadoop client install. This was incredibly useful and avoided having to manually transfer your jars from your desktop computer to a node in the cloud just to run a job.

Key Features:

  1. Maintain a list of clusters you connect to, pick from a list
  2. Remembers last cluster used
  3. Provides remote interaction with cluster - including HDFS with auto-completion (thanks to the bash completion feature and these settings for hdfs: http://blog.rapleaf.com/dev/2009/11/17/command-line-auto-completion-for-hadoop-dfs-commands/)
  4. Uses the standard hadoop client script provided in all Hadoop 0.20.1+ releases

Presentation

See: Presentation Slides

Hadoop Environment

Bash $hell Helper Scripts for Hadoop

http://code.google.com/p/hadoopenv/


What Does it Do?

  • Just a wrapper for hadoop utility
    • e.g. hadoop-0.20.1/bin/hadoop
    • Requires a working hadoop script
  • Remotely access clusters
    • submit and manage jobs
  • Access HDFS, auto-complete on remote paths
  • Manages information about clusters
  • Facilitates easy switching between clusters

Setup

  • Download [http://code.google.com/p/hadoopenv/]
  • See README for install
  • ~/.hdconf/cluster.csv — lists clusters you know about
  • Pick a cluster: $. hdenv.sh cluster-name
  • Source hdenv.sh in your bash profile and it will set your environment to the last cluster picked

General Usage

  • Commands:
    • hdclusters lists known clusters to pick from
    • hdwhich lists the currently selected cluster
    • hdfs hadoop fs with autocompletion
    • hdjar submit a job jar to the selected cluster
    • hdjob hadoop job
    • doJob.sh submits job jar, collects job conf, reports exit status, etc.
    • jobrpt.sh downloads completed job’s task logs
    • hdweb opens tracker web site for selected cluster
    • hdwebf opens namenode of selected cluster

Demo

  • Demo some sample usage…
  • hdclusters, hdwhich, hdweb*, hdfs, hdjar, doJob.sh, jobrpt.sh,…

Gotchas

  • Your jobs will inherit your local settings
    • adjust your local settings accordingly
      • E.g. conf/hdfs-site.xml (dfs.replication)