Hadoop Environment (hdenv) was a bash shell helper script for Hadoop that made switching between clusters and executing common commands easier.
http://code.google.com/p/hadoopenv/Â
The project was a set of utilities meant to make working with multiple remote Hadoop clusters from the command line easier. It relied primarily on features of later versions (3.1+) of the bash shell environment.
It allowed you to interact with a remote hadoop cluster - without ssh’ing to the primary name node and using its hadoop client install. This was incredibly useful and avoided having to manually transfer your jars from your desktop computer to a node in the cloud just to run a job.
Key Features:
- Maintain a list of clusters you connect to, pick from a list
- Remembers last cluster used
- Provides remote interaction with cluster - including HDFS with auto-completion (thanks to the bash completion feature and these settings for hdfs:Â http://blog.rapleaf.com/dev/2009/11/17/command-line-auto-completion-for-hadoop-dfs-commands/)
- Uses the standard hadoop client script provided in all Hadoop 0.20.1+ releases
Presentation
See:Â Presentation Slides
Hadoop Environment
Bash $hell Helper Scripts for Hadoop
http://code.google.com/p/hadoopenv/
What Does it Do?
- Just a wrapper for hadoop utility
- e.g. hadoop-0.20.1/bin/hadoop
- Requires a working hadoop script
- Remotely access clusters
- submit and manage jobs
- Access HDFS, auto-complete on remote paths
- Manages information about clusters
- Facilitates easy switching between clusters
Setup
- Download [http://code.google.com/p/hadoopenv/]
- See README for install
~/.hdconf/cluster.csv
— lists clusters you know about- Pick a cluster:
$. hdenv.sh
cluster-name - Source hdenv.sh in your bash profile and it will set your environment to the last cluster picked
General Usage
- Commands:
hdclusters
lists known clusters to pick fromhdwhich
lists the currently selected clusterhdfs
hadoop fs with autocompletionhdjar
submit a job jar to the selected clusterhdjob
hadoop jobdoJob.sh
submits job jar, collects job conf, reports exit status, etc.jobrpt.sh
downloads completed job’s task logshdweb
opens tracker web site for selected clusterhdwebf
opens namenode of selected cluster
Demo
- Demo some sample usage…
- hdclusters, hdwhich, hdweb*, hdfs, hdjar, doJob.sh, jobrpt.sh,…
Gotchas
- Your jobs will inherit your local settings
- adjust your local settings accordingly
- E.g. conf/hdfs-site.xml (dfs.replication)
- adjust your local settings accordingly