| Did this page help you? Yes No Tell us about it... |
When you run a Hadoop daemon or job, a number of scripts are executed as part of the initialization process.
The script that runs when you enter hadoop at the Hadoop command line is a shell script located in
"/home/hadoop/bin/hadoop. This script is responsible for setting up the Java
classpath, configuring the Java memory settings, determining which main class to run, and
executing the actual Java process.
As part of the Hadoop configuration, the hadoop script executes a file called
conf/hadoop-env.sh. The hadoop-env.sh script can set various environment
variables. The conf/hadoop-env.sh script is used so that the main bin/hadoop script
remains unmodified. Amazon Elastic MapReduce (Amazon EMR) creates a hadoop-env.sh script on every node in a cluster
in order to configure the amount of memory for every Hadoop daemon launched.
Additionally, Amazon EMR provides a user customizable script, conf/hadoop-user-env.sh,
to allow you to override the default Hadoop settings that Amazon EMR configures.
You should put your custom overrides for the Hadoop environment variables in
conf/hadoop-user-env.sh. Custom overrides could include items such as changes to Java memory or
naming additional JAR files in the classpath. The script is also where Amazon EMR writes
data when you use a bootstrap action to configure memory or specifying additional Java args.
Examples of environment variables that you can specify in hadoop-user-env.sh include:
export HADOOP_DATANODE_HEAPSIZE="128"
export HADOOP_JOBTRACKER_HEAPSIZE="768"
export HADOOP_NAMENODE_HEAPSIZE="256"
export HADOOP_OPTS="-server"
export HADOOP_TASKTRACKER_HEAPSIZE="512"
Bootstrap actions run before Hadoop starts and before any steps are run. In some cases it is necessary to configure the Hadoop environment variables referenced in the Hadoop launch script.
If the script /home/hadoop/conf/hadoop-user-env.sh exists when Hadoop launches,
Amazon EMR executes this script and any options are passed on to bin/hadoop.
For example, if you want to add a JAR file to the Hadoop daemon classpath, you can use a
bootstrap action such as:
#!/bin/bash echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh
For more information on using bootstrap actions, refer to Bootstrap Actions.