| Did this page help you? Yes No Tell us about it... |
Hadoop sends data between the mappers and reducers in its shuffle process. This network operation is a bottleneck for many job flows. To reduce this bottleneck, Amazon Elastic MapReduce (Amazon EMR) enables intermediate data compression by default. Because it provides a reasonable amount of compression with only a small CPU impact, we use the LZO codec.
You can modify the default compression settings with a bootstrap action. For more information on using bootstrap actions, refer to Bootstrap Actions.
The following table presents the default values for the parameters that affect intermediate compression.
| Parameter | Value |
|---|---|
| mapred.compress.map.output | true |
| mapred.map.output.compression.codec | com.hadoop.compression.lzo.LzoCodec |
Example Enabling/disabling compression using a bootstrap action
$ ./elasticmapreduce --create --alive --name "Reducer speculative execution" \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --bootstrap-name "Disable compression" \ --args "mapred.compress.map.output=false" \ --args "mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"