| Did this page help you? Yes No Tell us about it... |
Multipart upload allows you to upload a single file to Amazon S3 as a set of parts. Using the AWS Java SDK, you can upload these parts incrementally and in any order. Using the multipart upload method can result in faster uploads and shorter retries than when uploading a single large file.
Amazon Elastic MapReduce (Amazon EMR) supports multipart upload, but disables the feature by default. If a cluster node fails, the in-progress upload still exists in Amazon S3, and you are charged for the partial data stored on Amazon S3. It is up to you to manually remove the failed uploads from Amazon S3. The AWS Java SDK has a helper method called abortMultipartUploads, which makes it easy to clean up failed uploads.
The Amazon EMR configuration parameters for multipart upload are described in the following table.
| Configuration Parameter Name | Default Value | Description |
|---|---|---|
| fs.s3n.multipart.uploads.enabled | false | A boolean type that indicates whether to enable multipart uploads. |
| fs.s3n.ssl.enabled | true | A boolean type that indicates whether to use http or https. |
You modify the configuration parameters for multipart uploads using a bootstrap action.
This procedure explains how to enable multipart upload using the Amazon EMR console.
To enable multipart uploads with a predefined bootstrap action
Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.
Click the Create New Job Flow button and fill out the Create a New Job Flow wizard. For more information about creating job flows, see Creating a Job Flow.
On the BOOTSTRAP ACTIONS pane of the wizard, select Configure your Bootstrap Actions.
For Action Type select Configure Hadoop.
In Optional Arguments, replace the default value with the following: -c fs.s3n.multipart.uploads.enabled=true -c fs.s3n.multipart.uploads.split.size=524288000

If you have more bootstrap actions to add, click Add another Bootstrap Action. When all of your bootstrap actions are added, click Continue to go to the REVIEW pane of the Create a New Job Flow wizard.
This procedure explains how to enable multipart upload using the CLI. The command creates a job flow in a waiting state with multipart upload enabled.
| If you are using... | Enter the following... |
|---|---|
| Linux or UNIX |
$ ./elastic-mapreduce --create --alive \ --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \ --bootstrap-name "enable multipart upload" \ --args "-c,fs.s3n.multipart.uploads.enabled=true, \ -c,fs.s3n.multipart.uploads.split.size=524288000" |
| Microsoft Windows |
c:\ruby elastic-mapreduce --create --alive --bootstrap-action
s3://elasticmapreduce/bootstrap-actions/configure-hadoop --bootstrap-name "enable multipart
upload" --args
"-c,fs.s3n.multipart.uploads.enabled=true,-c,fs.s3n.multipart.uploads.split.size=524288000"
|
This job flow remains in the WAITING state until it is terminated.
For information on using Amazon S3 multipart uploads programmatically, go to Using the AWS SDK for Java for Multipart Upload in the Amazon S3 Developer Guide.
For more information on the AWS SDK for Java, go to the AWS SDK for Java detail page.