| Did this page help you? Yes No Tell us about it... |
Topics
Amazon Elastic MapReduce (Amazon EMR) uses Amazon Machine Images (AMIs) to initialize the Amazon EC2 instances it launches to run a job flow. The AMIs contain the Linux operating system, Hadoop, and other software used to run the job flow. These AMIs are specific to Amazon EMR and can be used only in the context of running a job flow. Periodically, Amazon EMR updates these AMIs with new versions of Hadoop and other software, so users can take advantage of improvements and new features.
For general information about AMIs, go to Using AMIs in the Amazon Elastic Compute Cloud User Guide. For details about the software versions included in the Amazon EMR AMIs, go to the section called “AMI Versions Supported in Amazon EMR”.
If your application depends on a specific version or configuration of Hadoop, you might want delay upgrading to the new AMI until you have tested your application on it. AMI versioning gives you the option to specify which AMI version your job flow uses to launch Amazon EC2 instances.
Specifying the AMI version during job flow creation is optional; if you do not provide an AMI-version parameter, and you are using the CLI, your job flows will run on the most recent AMI version. This means you always have the latest software running on your job flows, but you must ensure that your application will work with new changes as they are released.
If you specify an AMI version when you create a job flow, your instances will be created using that AMI. This provides stability for long-running or mission-critical applications. The trade-off is that your application will not have access to new features on more up-to-date AMI versions.
![]() | Note |
|---|---|
The default configuration for the Amazon EMR console and copies of the CLI downloaded after 11 December 2011 is the latest AMI version. The default for the SDK, the API, and CLIs downloaded prior to 11 December 2011 is AMI version 1.0, Hadoop 0.18. For details about the configuration and applications available on AMI versions, see AMI Versions Supported in Amazon EMR. |
You can specify which AMI version a new job flow should use when you create it.
![]() | Note |
|---|---|
AMI versioning is not currently supported in the Amazon EMR console. Job flows created through the Amazon EMR console will use the most current version available. |
To specify an AMI version using the CLI
When creating a job flow using the CLI, add
the --ami-version parameter, as shown in the following example. If you
do not specify this parameter, or if you specify --ami-version latest the most
recent version of AMI will be used.
$ ./elastic-mapreduce --create --alive --name "Test AMI Versioning" \ --ami-version1.0--hadoop-version0.20\ --num-instances 5 --instance-type m1.small
To specify an AMI version using the API
When creating a job flow using the API, add
the AmiVersion and the HadoopVersion parameters to the request string, as shown in the following example. If you do not
specify these parameters, Amazon EMR will create the job flow using the version 1.0 AMI and Hadoop 0.20.
For more information, go to RunJobFlow
in the Amazon Elastic MapReduce API Reference.
https://elasticmapreduce.amazonaws.com?Operation=RunJobFlow &Name=MyJobFlowName &LogUri=s3n%3A%2F%2Fmybucket%2Fsubdir &AmiVersion=1.0&HadoopVersion=0.20&Instances.MasterInstanceType=m1.small &Instances.SlaveInstanceType=m1.small &Instances.InstanceCount=4 &Instances.Ec2KeyName=myec2keyname &Instances.Placement.AvailabilityZone=us-east-1a &Instances.KeepJobFlowAliveWhenNoSteps=true &Steps.member.1.Name=MyStepName &Steps.member.1.ActionOnFailure=CONTINUE &Steps.member.1.HadoopJarStep.Jar=MyJarFile &Steps.member.1.HadoopJarStep.MainClass=MyMainClass &Steps.member.1.HadoopJarStep.Args.member.1=arg1 &Steps.member.1.HadoopJarStep.Args.member.2=arg2 &AuthParams
If you need to find out which AMI version a job flow is running, you can retrieve this information using either the CLI or the API.
![]() | Note |
|---|---|
AMI versioning is not currently supported in the Amazon EMR console. Job flows created through the Amazon EMR console will use the most current version available. |
To check the current AMI version using the CLI
Use the --describe parameter to retrieve the AMI version on a job flow. In the
following example JobFlowID is the identifier of
the job flow. The AMI version will be returned along with other information about the job flow.
$ ./elastic-mapreduce --describe -–jobflow JobFlowID
To check the current AMI version using the API
Call DescribeJobFlows to check which AMI version a job flow is using.
The version will be returned as part of the response data, as shown in the following example.
For the complete response syntax, go to
DescribeJobFlows
in the Amazon Elastic MapReduce API Reference.
<DescribeJobFlowsResponse xmlns="http://elasticmapreduce.amazonaws.com/doc/2009-03-31">
<DescribeJobFlowsResult>
<JobFlows>
<member>
...
<AmiVersion>
1.0
</AmiVersion>
...
</member>
</JobFlows>
</DescribeJobFlowsResult>
<ResponseMetadata>
<RequestId>
9cea3229-ed85-11dd-9877-6fad448a8419
</RequestId>
</ResponseMetadata>
</DescribeJobFlowsResponse>
An AMI can contain multiple versions of Hadoop. If the AMI you specify has multiple versions of Hadoop available, you can select the version of Hadoop you want to run as described in the section called “Hadoop Configuration”. You cannot specify a Hadoop version that is not available on the AMI. For a list of the versions of Hadoop supported on each AMI, go to AMI Versions Supported in Amazon EMR.
Eighteen months after an AMI version is released, the Amazon EMR team might choose to deprecate that AMI version and no longer support it. In addition, the Amazon EMR team might deprecate an AMI before eighteen months has elapsed if a security risk or other issue is identified in the software or operating system of the AMI. If a job flow is running when its AMI is depreciated, the job flow will not be affected. You will not, however, be able to create new job flows with the deprecated AMI version. The best practice is to plan for AMI obsolescence and move to new AMI versions as soon as is practical for your application.
Before an AMI is deprecated, the Amazon EMR team will send out an announcement specifying the date on which the AMI version will no longer be supported.
Amazon EMR supports the AMI versions listed in the following table. You can specify the AMI version you want to use when you create a job flow.
If you do not specify an AMI version, the default version is used. For the Amazon EMR console and versions of the CLI released after released after the AMI versioning release (12-08-11), the default version is the latest version. For the API, SDK, and versions of the CLI downloaded before AMI versioning was released, the default version is AMI 1.0.
| AMI Version | Description | Release Date | |||
|---|---|---|---|---|---|
| 2.0.5 |
Same as AMI 2.0.4, with the following additions:
| 19 April 2012 | |||
| 2.0.4 |
Same as AMI 2.0.3, with the following additions:
| 30 January 2012 | |||
| 2.0.3 |
Same as AMI 2.0.2, with the following additions:
| 24 January 2012 | |||
| 2.0.2 |
Same as AMI 2.0.1, with the following additions:
| 17 January 2012 | |||
| 2.0.1 |
Same as AMI 2.0 except for the following bug fixes:
| 19 December 2011 | |||
| 2.0 |
Operating system: Debian 6.0.2 (Squeeze) Applications: Hadoop 0.20.205, Hive 0.7.1, Pig 0.9.1 Languages: Perl 5.10.1, PHP 5.3.3, Python 2.6.6, R 2.11.1, Ruby 1.8.7 File system: ext3 for root, xfs for ephemeral Kernel: Amazon Linux Note: Added support for the Snappy compression/decompression library. | 11 December 2011 | |||
| 1.0.1 |
Same as AMI 1.0 except for the following change:
| 3 April 2012 | |||
| 1.0 |
Operating system: Debian 5.0 (Lenny) Applications: Hadoop 0.20 and 0.18 (default); Hive 0.5, 0.7 (default), 0.7.1; Pig 0.3 (on Hadoop 0.18), 0.6 (on Hadoop 0.20) Languages: Perl 5.10.0, PHP 5.2.6, Python 2.5.2, R 2.7.1, Ruby 1.8.7 File system: ext3 for root and ephemeral Kernel: Red Hat Note: This was the last AMI released before the CLI was updated to support AMI versioning. For backward compatibility, job flows launched with versions of the CLI downloaded before 11 December 2011 use this version. | 26 April 2011 |