| Did this page help you? Yes No Tell us about it... |
Amazon Elastic MapReduce (Amazon EMR) is a data analysis tool that simplifies the set-up and management of a computer cluster, the source data, and the computational tools that help you implement sophisticated data processing jobs quickly.
Typically, data processing involves performing a series of relatively simple operations on large amounts of data. In Amazon EMR, each operation is called a step and a sequence of steps is a job flow. A job flow that processes encrypted data might look like the following example.
Amazon EMR uses Hadoop to divide up the work among the instances in the cluster, tracks status, and combine the individual results into one output. For an overview of Hadoop, see What Is Hadoop?.
Amazon EMR takes care of provisioning a Hadoop cluster, running the job flow, terminating the job flow, moving the data between Amazon EC2 and Amazon S3, and optimizing Hadoop. Amazon EMR removes most of the cumbersome details of setting up the hardware and networking required by the Hadoop cluster, such as monitoring the setup, configuring Hadoop, and executing the job flow. Together, Amazon EMR and Hadoop provide all of the power of Hadoop processing with the ease, low cost, scalability, and power that Amazon S3 and Amazon EC2 offer.