Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Configuration of hadoop-user-env.sh

When you run a Hadoop daemon or job, a number of scripts are executed as part of the initialization process. The script that runs when you enter hadoop at the Hadoop command line is a shell script located in "/home/hadoop/bin/hadoop. This script is responsible for setting up the Java classpath, configuring the Java memory settings, determining which main class to run, and executing the actual Java process.

As part of the Hadoop configuration, the hadoop script executes a file called conf/hadoop-env.sh. The hadoop-env.sh script can set various environment variables. The conf/hadoop-env.sh script is used so that the main bin/hadoop script remains unmodified. Amazon Elastic MapReduce (Amazon EMR) creates a hadoop-env.sh script on every node in a cluster in order to configure the amount of memory for every Hadoop daemon launched.

Additionally, Amazon EMR provides a user customizable script, conf/hadoop-user-env.sh, to allow you to override the default Hadoop settings that Amazon EMR configures.

You should put your custom overrides for the Hadoop environment variables in conf/hadoop-user-env.sh. Custom overrides could include items such as changes to Java memory or naming additional JAR files in the classpath. The script is also where Amazon EMR writes data when you use a bootstrap action to configure memory or specifying additional Java args.

Examples of environment variables that you can specify in hadoop-user-env.sh include:

  • export HADOOP_DATANODE_HEAPSIZE="128"

  • export HADOOP_JOBTRACKER_HEAPSIZE="768"

  • export HADOOP_NAMENODE_HEAPSIZE="256"

  • export HADOOP_OPTS="-server"

  • export HADOOP_TASKTRACKER_HEAPSIZE="512"

Bootstrap actions run before Hadoop starts and before any steps are run. In some cases it is necessary to configure the Hadoop environment variables referenced in the Hadoop launch script.

If the script /home/hadoop/conf/hadoop-user-env.sh exists when Hadoop launches, Amazon EMR executes this script and any options are passed on to bin/hadoop.

For example, if you want to add a JAR file to the Hadoop daemon classpath, you can use a bootstrap action such as:

#!/bin/bash echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh

For more information on using bootstrap actions, refer to Bootstrap Actions.