Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Running a Script in a Job Flow

Amazon Elastic MapReduce (Amazon EMR) enables you to run a script at any time during step processing in your job flow. You specify a step that runs a script either when you create your job flow or you can add a step if your job flow is in the WAITING state. For more information about adding steps, go to Add Steps to a Job Flow. For more information on running an interactive job flow, go to Interactive and Batch Modes.

If you want to run a script before step processing begins, use a bootstrap action. For more information on bootstrap actions, go to Bootstrap Actions.

If you want to run a script immediately before job flow shutdown, use a shutdown action. For more information on shutdown actions, go to Shutdown Actions.

You can only run multi-step job flows from the CLI and the API. The Amazon EMR console does not support multiple steps.

CLI

This section describes how to add a step to run a script. The script-runner.jar takes arguments to the path to a script and any additional arguments for the script. The JAR file runs the script with the passed arguments. Script-runner.jar is located at s3://elasticmapreduce/libs/script-runner/script-runner.jar.

The job flow containing a step that runs a script looks similar to the following:

.\elastic-mapreduce --create --alive --name "My Development Jobflow" \
--jar s3://elasticmapreduce/libs/script-runner/script-runner.jar \
--args "s3://myawsbucket/script-path/my_script.sh"

This job flow runs the script my_script.sh when the step is processed.

API

This section describes the Amazon EMR API Query request needed to add a step to run a script. The response includes a <JobFlowID>.

The Amazon EMR JSON sample below contains a step that specifies the JAR s3://elasticmapreduce/libs/script-runner/script-runner.jar and passes the location and file name of the script.

[
{ "Name": "streaming job flow",
  "HadoopJarStep":
        {
        "Jar": "/home/hadoop/contrib/streaming/hadoop-streaming.jar",
        "Args":
           [
            "-input",   "s3n://elasticmapreduce/samples/wordcount/input",
            "-output",  "s3n://myawsbucket",
            "-mapper",  "s3://elasticmapreduce/samples/wordcount/wordSplitter.py",
            "-reducer", "aggregate"
           ]
        }},
{
"Name": "My Script Step",
"HadoopJarStep":
         {
         "Jar": "s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar",
          "Args":
           [
            "s3://myawsbucket/script-path/my_script.sh"  
           ]
        }}
]

This job flow runs the script my_script.sh when the step is processed.