| Did this page help you? Yes No Tell us about it... |
This section describes the methods for adding steps to a job flow.
You can add steps to a running job flow only if you set the
KeepJobFlowAliveWhenNoSteps parameter to True when you
create the job flow. This value keeps the Hadoop cluster engaged even after the completion of
a job flow.
The Amazon Elastic MapReduce (Amazon EMR) console does not support adding steps to a job flow.
Example using the CLI
The following example creates a simple job flow and then adds a step to the job flow.
To add a step to a job flow
Create a job flow:
| If you are using... | Enter the following... |
|---|---|
| Linux or UNIX |
$ ./elastic-mapreduce --create --active --stream |
| Microsoft Windows | c:\ruby elastic-mapreduce --create --active
--stream |
The --streaming parameter adds a streaming step using default
parameters. The default parameters are the word count example that is available in the
Amazon EMR console.
The output looks similar to the following.
Created job flow JobFlowIDAdd a step:
| If you are using... | Enter the following... |
|---|---|
| Linux or UNIX |
$ ./elastic-mapreduce -j
|
| Microsoft Windows | $ ./elastic-mapreduce -j |
This command runs an example job flow step that downloads and runs the JAR file. The arguments are passed to the main function in the JAR file.
If your JAR file has a manifest, you do not need to specify the JAR file's main class using
--main-class, as shown in the preceding example.
Example using the API
The steps parameter defines the location and input parameters for
the Hadoop JAR steps that perform the processing on the input data. Each step is identified
by a member number.
Typically, you specify all job flow steps in a RunJobFlow request.
The value of AddJobFlowSteps is that you can add steps to a job flow
while it is already loaded onto the Amazon EC2 instances. You typically add steps to modify
the data processing or to aid in debugging a job flow when you are working interactively
with the job flow, that is, you are adding steps to the job flow while the job flow
execution is paused.
The name parameter helps you distinguish step results, so it is
best to make each name unique. Amazon EMR does not check for the uniqueness of step
names.
The remainder of the steps parameter specifies the JAR file and
the input parameters used to process the data.
When you debug a job flow, you must set the RunJobFlow parameter
KeepJobAliveWhenNoSteps to True and
ActionOnFailure to CANCEL_AND_WAIT.
![]() | Note |
|---|---|
The maximum number of steps allowed in a job flow is 256. For ways to overcome this limitation, go to the section called “Add More than 256 Steps to a Job Flow” |
To add steps to a job flow
Send a request similar to the following.
https://elasticmapreduce.amazonaws.com? JobFlowId=JobFlowID& Steps.member.1.Name=MyStep2& Steps.member.1.ActionOnFailure=CONTINUE& Steps.member.1.HadoopJarStep.Jar=s3://mybucket/MySecondJar& Steps.member.1.HadoopJarStep.MainClass=MainClass& Steps.member.1.HadoopJarStep.Args.member.1=arg1& Operation=AddJobFlowSteps& AWSAccessKeyId=AccessKeyID& SignatureVersion=2& SignatureMethod=HmacSHA256& Timestamp=2009-01-28T21%3A51%3A51.000Z& Signature=calculated value
For more information about the parameters unique to
AddJobFlowSteps, see AddJobFlowSteps. For more information about the generic parameters in the
request, see Common Request Parameters.
The response contains the request ID.