| Did this page help you? Yes No Tell us about it... |
Amazon Elastic MapReduce (Amazon EMR) works in conjunction with Amazon EC2 to create a Hadoop cluster, and with Amazon S3 to store scripts, input data, log files, and output results. The Amazon EMR process is outlined in the following figure and table.

Amazon EMR Process
| 1 | Upload to Amazon S3 the data you want to process, as well as the mapper and reducer executables that process the data, and then send a request to Amazon EMR to start a job flow. |
| 2 | Amazon EMR starts a Hadoop cluster, which loads any specified bootstrap actions and then runs Hadoop on each node. |
| 3 | Hadoop executes a job flow by downloading data from Amazon S3 to core and task nodes. Alternatively, the data is loaded dynamically at run time by mapper tasks. |
| 4 | Hadoop processes the data and then uploads the results from the cluster to Amazon S3. |
| 5 | The job flow is completed and you retrieve the processed data from Amazon S3. |
For details on mapping legacy job flows to instance groups, see Mapping Legacy Job Flows to Instance Groups.