Amazon Elastic MapReduce
Getting Started Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Where Do I Go from Here?

Amazon Elastic MapReduce (Amazon EMR) is a rich service offering many features than are not covered in this guide, such as Hadoop logging& Pig, and Custom JAR job flows& Bootstrap Action&, and virtual private networking. This section provides links to additional resources, that will help you deepen your understanding of Amazon EMR.

Other Ways to Access Amazon EMR

This guide has shown you how to launch and terminate job flows using Amazon EMR. You can continue using Amazon EMR through the command line interface, or try one of the other interfaces.

Continue Using the Command Line Interface

To learn more about the Amazon EMR command line interface, refer to the Amazon Elastic MapReduce Developer Guide. The CLI offers full support of all the Amazon EMR functions without requiring you to code or use the Amazon EMR library.

Use the Amazon EMR Console

The Amazon EMR console includes many functions besides just monitoring debug output. To learn more about how to use the Amazon EMR console, go to the Amazon Elastic MapReduce Developer Guide. The Amazon EMR console also has help to assist you.

Code Directly to the Web Service API

If you want to write code directly to the Amazon EMR Query API, go to the Amazon Elastic MapReduce Developer Guide. The guide describes how to create and authenticate API requests, and how to use Amazon EMR through the APIs. For a complete description of all the API actions, go to the Amazon Elastic MapReduce API Reference.

Learn More About Amazon EMR

This section lists additional features in Amazon EMR and tells you where to find more information. You can also find additional information about Amazon EMR in the Amazon EMR Articles & Tutorials area of the AWS web site.

Streaming Job Flows

The sample streaming job flow provided in this guide highlights the basic capabilities of Amazon Elastic MapReduce (Amazon EMR). For more information on using streaming job flows with Amazon EMR consider the following tutorial:

Job Flows Using Hive

The sample job flow with Hive provided in this guide highlights the basic capabilities of using Hive with Amazon Elastic MapReduce (Amazon EMR). For more information on using Hive with Amazon EMR consider the following:

Job Flows Using Pig

Pig is an open-source Apache library that runs on top of Hadoop. The library takes SQL-like commands written in a language called Pig Latin and converts these commands into MapReduce job flows. Pig enables you to create queries using familiar SQL-like commands and syntax, avoiding the complexities of writing MapReduce algorithms using a lower-level language, such as Java. While you can execute one Pig Latin command at a time, it is far more common to write a script of Pig Latin commands that accomplish a task. Elastic MapReduce can use such scripts when you upload them to Amazon S3.

For more information on using Pig with Elastic Map Reduce consider the following:

Job Flows Using Custom JAR files

A custom JAR job flow runs a compiled Java program that you have uploaded to Amazon S3. The program should be compiled against the version of Hadoop you want to launch and you should submit Hadoop jobs using the Hadoop JobClient interface.

For more information on using Elastic MapReduce with custom JAR files, consider the following tutorial.

Job Flows Using Cascading

Cascading is an open-source project providing an API for defining and executing complex, scale-free, and fault tolerant data processing work flows on Hadoop.

For more information on using Cascading with Elastic Map Reduce consider the following tutorial.

Bootstrap Actions

Bootstrap actions are programs that you run on all nodes of a job flow prior to starting Hadoop. With bootstrap actions you can do the following:

  • Install software on the node

  • Modify the default Hadoop site configuration

  • Change the way Java parameters use Hadoop daemons

You can specify a bootstrap action in the Amazon EMR console or the Amazon EMR command line client when starting job flows. Several predefined bootstrap actions are available, including Configure Hadoop, Configure Daemons, and Run-if.

For more information on Bootstrap Actions, see the Amazon Elastic MapReduce Developer Guide or refer to the following tutorial.

Hadoop Debugging

In addition to Amazon EMR logging, you also have the option to generate detailed Hadoop logs. Hadoop logging must be enabled when a job flow is created and will use SimpleDB to store the logs.

For more information on Hadoop debugging, see the Amazon Elastic MapReduce Developer Guide.

Learn More About Hadoop

Apache Hadoop is an open-source Java software framework that supports data processing of large data sets using server clusters.

For more information on the Hadoop framework, go to http://hadoop.apache.org/core/.

Amazon EMR Resources

The following table lists related resources that you'll find useful as you work with this service.

Resource Description
Amazon Elastic MapReduce Getting Started Guide This document. Provides a quick tutorial of the service based on a simple use case. Examples and instructions are included.
Amazon Elastic MapReduce Developer Guide Provides conceptual information about Amazon EMR and describes how to use Amazon EMR features.
Amazon Elastic MapReduce API Reference Contains a technical description of all Amazon EMR APIs.
Amazon Elastic MapReduce Quick Reference Card Describes all of the command line parameters and their options.
Amazon EMR Technical FAQ Covers the top questions developers have asked about this product.
Amazon EMR Release Notes Gives a high-level overview of the current release, and notes any new features, corrections, and known issues.

AWS Developer Resource Center

A central starting point to find documentation, code samples, release notes, and other information to help you build innovative applications with AWS.

Amazon EMR console

Enables you to perform most of the functions of Amazon EMR and other AWS products without programming.

Discussion Forums

A community-based forum for developers to discuss technical questions related to Amazon Web Services.

AWS Support Center

The home page for AWS Technical Support, including access to our Developer Forums, Technical FAQs, Service Status page, and AWS Premium Support (if you are subscribed to this program).

AWS Premium Support Information

The primary web page for information about AWS Premium Support, a one-on-one, fast-response support channel to help you build and run applications on AWS Infrastructure Services.

Amazon EMR Product Information

The primary web page for information about Amazon EMR.

Form for questions related to your AWS account: Contact Us

This form is only for account questions. For technical questions, use the Discussion Forums.

Conditions of Use

Detailed information about the copyright and trademark usage at Amazon.com and other topics.

Please take a moment to provide feedback on this document by clicking the following button.

Feedback