| Did this page help you? Yes No Tell us about it... |
Topics
This section covers the fundamentals of creating, managing, and troubleshooting a job flow using Amazon Elastic MapReduce (Amazon EMR). All supported job flow types are described. Information on using the Amazon EMR console, the CLI, SDKs, and API is included.
If you have not signed up to use Amazon EMR, instructions are provided in the Getting Started Guide.
![]() | Tip |
|---|---|
We strongly recommend that you work through the examples in the Getting Started Guide to get a basic understanding of Amazon EMR. |
Amazon EMR offers a variety of interfaces, including a console, a command line interface (CLI), a query API, AWS SDKs, and libraries. Each interface offers a different balance of ease and functionality. The interface you choose depends on your knowledge of Hadoop, your programming skills, and the functionality you require:
The Amazon EMR console provides a graphical interface from which you can launch Amazon EMR job flows and monitor their progress.
The CLI combines full compatibility with the Amazon EMR API without requiring a programming environment. The Ruby-based Amazon EMR CLI is available for download at Amazon Elastic MapReduce Ruby Client (http://aws.amazon.com/developertools/2264.)
The Amazon EMR API, SDKs, and libraries offer the most flexibility but require a programming environment and software development skills. For more information on using the query API to access Amazon EMR see Calling the Amazon EMR API in this guide. The AWS SDKs provides support for Java, C#, and .NET. For more information on the AWS SDKs, refer to the list of current AWS SDKs. Libraries are available for Perl and PHP. For more information about the Perl and PHP libraries see Sample Code & Libraries (http://aws.amazon.com/code/Elastic-MapReduce.)
The following table compares the functionality of the Amazon EMR interfaces.
| Function | Amazon EMR Console | CLI | API/SDK/ Libraries |
|---|---|---|---|
| Create multiple job flows |
![]() |
![]() |
![]() |
| Define bootstrap actions in a job flow |
![]() |
![]() |
![]() |
| View logs for Hadoop jobs, tasks, and task attempts using a graphical interface |
![]() | ||
| Implement Hadoop data processing programmatically |
![]() | ||
| Monitor job flows in real time |
![]() | ||
| Provide verbose job flow details |
![]() |
![]() | |
| Resize running job flows |
![]() |
![]() | |
| Run job flows with multiple steps |
![]() |
![]() | |
| Select version of Hadoop, Hive, and Pig |
![]() |
![]() | |
| Specify the MapReduce executable in multiple computer languages |
![]() |
![]() |
![]() |
| Specify the number and type of Amazon Amazon EC2 instances that process the data |
![]() |
![]() |
![]() |
| Transfer data to and from Amazon S3 automatically |
![]() |
![]() |
![]() |
| Terminate job flows in real time |
![]() |
![]() |
The following sections describe how to use Amazon Elastic MapReduce (Amazon EMR) with each of the interface types.