Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Upgrading to Hadoop 0.20

This section describes how to upgrade your Amazon Elastic MapReduce (Amazon EMR) deployment to Hadoop 0.20.

Many Hadoop jobs that run successfully on Hadoop 0.18 run without modification on Hadoop 0.20 and later. However, before you engage in a full upgrade, we recommend recompiling your Hadoop jobs against Hadoop 0.20 and testing on small subsets of your data.

[Note]Note

Hadoop 0.20.205 is a minor version update to 0.20, and the following information applies to it as well.

Streaming jobs should also work without modification, but we recommend using the new streaming parameters introduced with version 0.20. These are summarized in the following table.

Hadoop 0.18 Hadoop 0.20 Type
-cacheFile -files Comma separated URIs
-cacheArchive -archives Comma separated URIs
-jobconf -D key=value


When using Amazon EMRwith Hadoop 0.20 we offer the additional guidance listed below:

  • You should recompile cascading applications with the Hadoop 0.20 version specified so they can take advantage of the new features available in this version.

  • Hadoop 0.20 fully supports Pig scripts.

  • All Amazon EMRsample applications are compatible with Hadoop 0.20. The Amazon EMR console supports only Hadoop 0.20, so samples default to 0.20 once launched.