Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Using Amazon EMR

This section covers the fundamentals of creating, managing, and troubleshooting a job flow using Amazon Elastic MapReduce (Amazon EMR). All supported job flow types are described. Information on using the Amazon EMR console, the CLI, SDKs, and API is included.

If you have not signed up to use Amazon EMR, instructions are provided in the Getting Started Guide.

[Tip]Tip

We strongly recommend that you work through the examples in the Getting Started Guide to get a basic understanding of Amazon EMR.

Amazon EMR offers a variety of interfaces, including a console, a command line interface (CLI), a query API, AWS SDKs, and libraries. Each interface offers a different balance of ease and functionality. The interface you choose depends on your knowledge of Hadoop, your programming skills, and the functionality you require:

  • The Amazon EMR console provides a graphical interface from which you can launch Amazon EMR job flows and monitor their progress.

  • The CLI combines full compatibility with the Amazon EMR API without requiring a programming environment. The Ruby-based Amazon EMR CLI is available for download at Amazon Elastic MapReduce Ruby Client (http://aws.amazon.com/developertools/2264.)

  • The Amazon EMR API, SDKs, and libraries offer the most flexibility but require a programming environment and software development skills. For more information on using the query API to access Amazon EMR see Calling the Amazon EMR API in this guide. The AWS SDKs provides support for Java, C#, and .NET. For more information on the AWS SDKs, refer to the list of current AWS SDKs. Libraries are available for Perl and PHP. For more information about the Perl and PHP libraries see Sample Code & Libraries (http://aws.amazon.com/code/Elastic-MapReduce.)

The following table compares the functionality of the Amazon EMR interfaces.

FunctionAmazon EMR ConsoleCLIAPI/SDK/ Libraries
Create multiple job flows
Define bootstrap actions in a job flow
View logs for Hadoop jobs, tasks, and task attempts using a graphical interface
  
Implement Hadoop data processing programmatically  
Monitor job flows in real time
  
Provide verbose job flow details 
Resize running job flows 
Run job flows with multiple steps 
Select version of Hadoop, Hive, and Pig 
Specify the MapReduce executable in multiple computer languages
Specify the number and type of Amazon Amazon EC2 instances that process the data
Transfer data to and from Amazon S3 automatically
Terminate job flows in real time
 

The following sections describe how to use Amazon Elastic MapReduce (Amazon EMR) with each of the interface types.