Developer Guide (API Version 2009-03-31)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Add Steps to a Job Flow

This section describes the methods for adding steps to a job flow.

You can add steps to a running job flow only if you set the KeepJobFlowAliveWhenNoSteps parameter to True when you create the job flow. This value keeps the Hadoop cluster engaged even after the completion of a job flow.

The Amazon Elastic MapReduce (Amazon EMR) console does not support adding steps to a job flow.

Example using the CLI

The following example creates a simple job flow and then adds a step to the job flow.


To add a step to a job flow

  1. Create a job flow:

    If you are using...Enter the following...
    Linux or UNIX
    $ ./elastic-mapreduce --create --active --stream
    Microsoft Windowsc:\ruby elastic-mapreduce --create --active --stream

    The --streaming parameter adds a streaming step using default parameters. The default parameters are the word count example that is available in the Amazon EMR console.

    The output looks similar to the following.

    Created job flow JobFlowID
  2. Add a step:

    If you are using...Enter the following...
    Linux or UNIX
    $ ./elastic-mapreduce -j JobFlowID \
        --jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar \
        --main-class org.myorg.WordCount \
        --arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br \
        --arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br \
        --arg hdfs:///cloudburst/output/1 \
        --arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 \
        --arg 24 --arg 128 --arg 16
    Microsoft Windows$ ./elastic-mapreduce -j JobFlowID --jar s3n://elasticmapreduce/samples/cloudburst/cloudburst.jar --main-class org.myorg.WordCount --arg s3n://elasticmapreduce/samples/cloudburst/input/s_suis.br --arg s3n://elasticmapreduce/samples/cloudburst/input/100k.br --arg hdfs:///cloudburst/output/1 --arg 36 --arg 3 --arg 0 --arg 1 --arg 240 --arg 48 --arg 24 --arg 24 --arg 128 --arg 16

This command runs an example job flow step that downloads and runs the JAR file. The arguments are passed to the main function in the JAR file.

If your JAR file has a manifest, you do not need to specify the JAR file's main class using --main-class, as shown in the preceding example.

Example using the API

The steps parameter defines the location and input parameters for the Hadoop JAR steps that perform the processing on the input data. Each step is identified by a member number.

Typically, you specify all job flow steps in a RunJobFlow request. The value of AddJobFlowSteps is that you can add steps to a job flow while it is already loaded onto the Amazon EC2 instances. You typically add steps to modify the data processing or to aid in debugging a job flow when you are working interactively with the job flow, that is, you are adding steps to the job flow while the job flow execution is paused.

The name parameter helps you distinguish step results, so it is best to make each name unique. Amazon EMR does not check for the uniqueness of step names.

The remainder of the steps parameter specifies the JAR file and the input parameters used to process the data.

When you debug a job flow, you must set the RunJobFlow parameter KeepJobAliveWhenNoSteps to True and ActionOnFailure to CANCEL_AND_WAIT.


[Note]Note

The maximum number of steps allowed in a job flow is 256. For ways to overcome this limitation, go to the section called “Add More than 256 Steps to a Job Flow”

To add steps to a job flow

  • Send a request similar to the following.

    https://elasticmapreduce.amazonaws.com?
    JobFlowId=JobFlowID&
    Steps.member.1.Name=MyStep2&
    Steps.member.1.ActionOnFailure=CONTINUE&
    Steps.member.1.HadoopJarStep.Jar=s3://mybucket/MySecondJar&
    Steps.member.1.HadoopJarStep.MainClass=MainClass&
    Steps.member.1.HadoopJarStep.Args.member.1=arg1&
    Operation=AddJobFlowSteps&
    AWSAccessKeyId=AccessKeyID&
    SignatureVersion=2&
    SignatureMethod=HmacSHA256&
    Timestamp=2009-01-28T21%3A51%3A51.000Z&
    Signature=calculated value

    For more information about the parameters unique to AddJobFlowSteps, see AddJobFlowSteps. For more information about the generic parameters in the request, see Common Request Parameters.

The response contains the request ID.