| Did this page help you? Yes No Tell us about it... |

This section provides general information on how to create and manage job flows using the Amazon EMR command line interface (CLI).
Amazon Elastic MapReduce (Amazon EMR) takes care of provisioning an Amazon EC2 cluster, terminating it, moving the data between it and Amazon S3, and optimizing Hadoop. Amazon EMR removes most of the details of setting up the hardware and networking required by the server cluster, such as monitoring the setup, configuring Hadoop, and executing the job flow.
Using the Amazon EMR CLI, you can construct a job flow that will continue to run until you terminate it. This process is useful for debugging. When a step fails, you can add another step to your active job flow without having to incur the shutdown and startup cost of a new job flow.
Typically, a step involves performing relatively simple operations on very large amounts of data. A step corresponds roughly to one algorithm that manipulates the data. A job flow typically consists of multiple steps. The output of one step often becomes the input of the next. A sequence of one or more steps is called a job flow.
The following command starts a job flow that consumes resources until you terminate it.
To create a job flow
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce --create --alive
Windows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --create --alive
The output will look similar to:
Created job flow JobFlowID This command launches a job flow running on a single m1.small instance. The
--alive option tells the job flow to keep running even when
it has finished all its steps.
A unique job flow ID is assigned to each newly created job flow. You use the job flow ID to identify and manage your job flow.
This section presents several methods to identify and manage your job flows.
You can use the --help parameters to list all of the
commands available in the Amazon EMR CLI.
To list all Amazon EMR commands
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce --help
Windows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --help
For more information on each of the Amazon EMR commands, see the Amazon Elastic MapReduce Developer Guide.
You can use the --list parameter to list all of your job
flows for the past two weeks.
To list all job flows
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce --list
Windows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --list
The response looks similar to the following:
JobFlowID STARTING
Development Job Flow (requires manual termination) For details on job flow STATES and additional methods to list job
flows, see the Amazon Elastic MapReduce Developer Guide.
You can get information about a job flow using the
--describe option and the associated job flow ID.
To get information about your job flow
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce --describe --jobflow [JobFlowID]Windows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --describe --jobflow [JobFlowID]The response looks similar to the following:
{
"JobFlows": [
{
"Name": "Development Job Flow (requires manual termination)",
"LogUri": "s3n:\/\/myawsbucket\/FileName\/",
"ExecutionStatusDetail": {
"StartDateTime": null,
"EndDateTime": null,
"LastStateChangeReason": "Starting instances",
"CreationDateTime": DateTimeStamp,
"State": "STARTING",
"ReadyDateTime": null
},
"Steps": [],
"Instances": {
"MasterInstanceId": null,
"Ec2KeyName": "KeyName",
"NormalizedInstanceHours": 0,
"InstanceCount": 5,
"Placement": {
"AvailabilityZone": "us-east-1a"
},
"SlaveInstanceType": "m1.small",
"HadoopVersion": "0.20",
"MasterPublicDnsName": null,
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceGroups": [
{
"StartDateTime": null,
"SpotPrice": null,
"Name": "Master Instance Group",
"InstanceRole": "MASTER",
"EndDateTime": null,
"LastStateChangeReason": "",
"CreationDateTime": DateTimeStamp,
"LaunchGroup": null,
"InstanceGroupId": "InstanceGroupID",
"State": "PROVISIONING",
"Market": "ON_DEMAND",
"ReadyDateTime": null,
"InstanceType": "m1.small",
"InstanceRunningCount": 0,
"InstanceRequestCount": 1
},
{
"StartDateTime": null,
"SpotPrice": null,
"Name": "Task Instance Group",
"InstanceRole": "TASK",
"EndDateTime": null,
"LastStateChangeReason": "",
"CreationDateTime": DateTimeStamp,
"LaunchGroup": null,
"InstanceGroupId": "InstanceGroupID",
"State": "PROVISIONING",
"Market": "ON_DEMAND",
"ReadyDateTime": null,
"InstanceType": "m1.small",
"InstanceRunningCount": 0,
"InstanceRequestCount": 2
},
{
"StartDateTime": null,
"SpotPrice": null,
"Name": "Core Instance Group",
"InstanceRole": "CORE",
"EndDateTime": null,
"LastStateChangeReason": "",
"CreationDateTime": DateTimeStamp,
"LaunchGroup": null,
"InstanceGroupId": "InstanceGroupID",
"State": "PROVISIONING",
"Market": "ON_DEMAND",
"ReadyDateTime": null,
"InstanceType": "m1.small",
"InstanceRunningCount": 0,
"InstanceRequestCount": 2
}
],
"MasterInstanceType": "m1.small"
},
"BootstrapActions": [],
"JobFlowId": "JobFlowID"
}
]
}For details on job flow parameter names and values, see the Amazon Elastic MapReduce Developer Guide and the Amazon Elastic MapReduce API Reference.
To use Amazon EMR debugging you must specify an Amazon S3 bucket location
in your credentials.json file. You specified the
log_uri parameter in the file you created as part of the
Configuring Credentials step.
You access Amazon EMR log files either by using the Amazon EMR console or by viewing them directly from the Amazon S3 console.
![]() | Note |
|---|---|
A five-minute delay occurs between when the log files stop being written and when they are available on Amazon S3. |
Hadoop debugging is also available to identify issues and problems in your job flows. For details on how to enable and configure Hadoop debugging, see the Amazon Elastic MapReduce Developer Guide.
You can add steps to a job flow if the RunJobFlow parameter
KeepJobFlowAliveWhenNoSteps is set to
True. This value keeps the Amazon EC2 cluster engaged
even after the successful completion of a job flow. The default setting for
KeepJobFlowAliveWhenNoSteps is
True and can be verified using the --describe
--jobflow [ commands. To identify
your job flow ID, refer to the preceding Retrieve Information About a Specific Job Flow section. JobFlowID]
To add a step using default parameter values to a job flow
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce -j JobFlowID --streamWindows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce -j JobFlowID --stream The --stream command adds a streaming step using default
parameters. In the Amazon EMR console, Hadoop streaming is
a feature of Hadoop that lets you create and run job flows using any executable
program or script as Hadoop mappers and reducers. You can view the step you just
added in the Amazon EMR console from either the CLI or the Amazon EMR console.
To view a job flow from the Amazon EMR console
Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.
Click Refresh.
Click the job flow with the added step.
In the Details pane at the bottom of the window, click the Steps tab.
Information about the step you added is displayed in the Steps tab.
Once you finish working with a job flow, you terminate it so you are no longer being charged for using AWS resources.
To terminate a job flow
Enter the following commands from the command-line prompt:
Linux and UNIX users:
$ ./elastic-mapreduce --terminate JobFlowIDWindows users:
C:\ruby\elastic-mapreduce-cli>ruby elastic-mapreduce --terminate JobFlowIDCongratulations! You have successfully created and terminated an Amazon EMR instance and learned about a few of the options available to you.
Now that you know how to create, debug, and terminate a job flow, you are ready to process actual data.
Click one of the following buttons to create either a streaming job flow or a job flow using Hive.

