Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Intermediate Compression (AMI 1.0)

Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many job flows. To reduce this bottleneck, Amazon Elastic MapReduce (Amazon EMR) enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.

You can modify the default compression settings with a bootstrap action. For more information on using bootstrap actions, refer to Bootstrap Actions.

The following table presents the default values for the parameters that affect intermediate compression.

ParameterValue
mapred.compress.map.output true
mapred.map.output.compression.codeccom.hadoop.compression.lzo.LzoCodec

Example Enabling/disabling compression using a bootstrap action

$ ./elasticmapreduce --create --alive --name "Reducer speculative execution" \
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \
--bootstrap-name "Disable compression" \
--args "mapred.compress.map.output=false"   \
--args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"