| Did this page help you? Yes No Tell us about it... |
Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your
job flow. You specify a step that runs a script either when you create your job flow or you
can add a step if your job flow is in the WAITING state. For more information
about adding steps, go to Add Steps to a Job Flow. For more information on running an interactive
job flow, go to Interactive and Batch Modes.
If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Bootstrap Actions.
If you want to run a script immediately before job flow shutdown, use a shutdown action. For more information on shutdown actions, go to Shutdown Actions.
You can only run multi-step job flows from the CLI and the API. The Amazon EMR console does not support multiple steps.
This section describes how to add a step to run a script. The
script-runner.jar takes arguments to the path to a script and
any additional arguments for the script. The JAR file runs the script with the passed
arguments. Script-runner.jar is located at
s3://elasticmapreduce/libs/script-runner/script-runner.jar.
The job flow containing a step that runs a script looks similar to the following:
.\elastic-mapreduce --create --alive --name "My Development Jobflow" \ --jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \ --args "s3://myawsbucket/script-path/my_script.sh"
This job flow runs the script my_script.sh when the step is
processed.
This section describes the Amazon EMR API Query request needed to add a step
to run a script. The response includes a
<JobFlowID>.
The Amazon EMR JSON sample below contains a step that specifies the JAR
s3://elasticmapreduce/libs/script-runner/script-runner.jar and passes
the location and file name of the script.
[
{ "Name": "streaming job flow",
"HadoopJarStep":
{
"Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
"Args":
[
"-input", "s3n://elasticmapreduce/samples/wordcount/input",
"-output", "s3n://myawsbucket",
"-mapper", "s3://elasticmapreduce/samples/wordcount/wordSplitter.py",
"-reducer", "aggregate"
]
}},
{
"Name": "My Script Step",
"HadoopJarStep":
{
"Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
"Args":
[
"s3://myawsbucket/script-path/my_script.sh"
]
}}
]This job flow runs the script my_script.sh when the step is
processed.