| Did this page help you? Yes No Tell us about it... |
This section describes how to upgrade your Amazon Elastic MapReduce (Amazon EMR) deployment to Hadoop 0.20.
Many Hadoop jobs that run successfully on Hadoop 0.18 run without modification on Hadoop 0.20 and later. However, before you engage in a full upgrade, we recommend recompiling your Hadoop jobs against Hadoop 0.20 and testing on small subsets of your data.
![]() | Note |
|---|---|
Hadoop 0.20.205 is a minor version update to 0.20, and the following information applies to it as well. |
Streaming jobs should also work without modification, but we recommend using the new streaming parameters introduced with version 0.20. These are summarized in the following table.
| Hadoop 0.18 | Hadoop 0.20 | Type |
|---|---|---|
| -cacheFile | -files | Comma separated URIs |
| -cacheArchive | -archives | Comma separated URIs |
| -jobconf | -D | key=value |
When using Amazon EMRwith Hadoop 0.20 we offer the additional guidance listed below:
You should recompile cascading applications with the Hadoop 0.20 version specified so they can take advantage of the new features available in this version.
Hadoop 0.20 fully supports Pig scripts.
All Amazon EMRsample applications are compatible with Hadoop 0.20. The Amazon EMR console supports only Hadoop 0.20, so samples default to 0.20 once launched.