| Did this page help you? Yes No Tell us about it... |
Topics
Amazon Elastic MapReduce (Amazon EMR) is a rich service offering many features than are not covered in this guide, such as Hadoop logging& Pig, and Custom JAR job flows& Bootstrap Action&, and virtual private networking. This section provides links to additional resources, that will help you deepen your understanding of Amazon EMR.
This guide has shown you how to launch and terminate job flows using Amazon EMR. You can continue using Amazon EMR through the command line interface, or try one of the other interfaces.
To learn more about the Amazon EMR command line interface, refer to the Amazon Elastic MapReduce Developer Guide. The CLI offers full support of all the Amazon EMR functions without requiring you to code or use the Amazon EMR library.
The Amazon EMR console includes many functions besides just monitoring debug output. To learn more about how to use the Amazon EMR console, go to the Amazon Elastic MapReduce Developer Guide. The Amazon EMR console also has help to assist you.
If you want to write code directly to the Amazon EMR Query API, go to the Amazon Elastic MapReduce Developer Guide. The guide describes how to create and authenticate API requests, and how to use Amazon EMR through the APIs. For a complete description of all the API actions, go to the Amazon Elastic MapReduce API Reference.
This section lists additional features in Amazon EMR and tells you where to find more information. You can also find additional information about Amazon EMR in the Amazon EMR Articles & Tutorials area of the AWS web site.
The sample streaming job flow provided in this guide highlights the basic capabilities of Amazon Elastic MapReduce (Amazon EMR). For more information on using streaming job flows with Amazon EMR consider the following tutorial:
Tutorial: Finding Similar Items with Amazon EMR, Python, and Hadoop Streaming http://aws.amazon.com/articles/2294
The sample job flow with Hive provided in this guide highlights the basic capabilities of using Hive with Amazon Elastic MapReduce (Amazon EMR). For more information on using Hive with Amazon EMR consider the following:
Tutorial: Contextual Advertising using Apache Hive and Amazon EMR with High Performance Computing instances http://aws.amazon.com/articles/2855
Video: Getting started with Hive on Amazon EMR http://aws.amazon.com/articles/2862
Pig is an open-source Apache library that runs on top of Hadoop. The library takes SQL-like commands written in a language called Pig Latin and converts these commands into MapReduce job flows. Pig enables you to create queries using familiar SQL-like commands and syntax, avoiding the complexities of writing MapReduce algorithms using a lower-level language, such as Java. While you can execute one Pig Latin command at a time, it is far more common to write a script of Pig Latin commands that accomplish a task. Elastic MapReduce can use such scripts when you upload them to Amazon S3.
For more information on using Pig with Elastic Map Reduce consider the following:
Tutorial: Parsing Logs with Apache Pig and Elastic MapReduce http://aws.amazon.com/articles/2729
Video: Getting Started with Apache Pig on Elastic MapReduce http://aws.amazon.com/articles/2735
A custom JAR job flow runs a compiled Java program that you have uploaded to Amazon S3. The program should be compiled against the version of Hadoop you want to launch and you should submit Hadoop jobs using the Hadoop JobClient interface.
For more information on using Elastic MapReduce with custom JAR files, consider the following tutorial.
Tutorial: How to Create and Debug an Amazon EMR Job Flow http://aws.amazon.com/articles/3938
Cascading is an open-source project providing an API for defining and executing complex, scale-free, and fault tolerant data processing work flows on Hadoop.
For more information on using Cascading with Elastic Map Reduce consider the following tutorial.
Tutorial: Cascading Multitool http://aws.amazon.com/jobflows/2293
Bootstrap actions are programs that you run on all nodes of a job flow prior to starting Hadoop. With bootstrap actions you can do the following:
Install software on the node
Modify the default Hadoop site configuration
Change the way Java parameters use Hadoop daemons
You can specify a bootstrap action in the Amazon EMR console or the Amazon EMR command line client when starting job flows. Several predefined bootstrap actions are available, including Configure Hadoop, Configure Daemons, and Run-if.
For more information on Bootstrap Actions, see the Amazon Elastic MapReduce Developer Guide or refer to the following tutorial.
Tutorial: How to Create and Debug an Amazon EMR Job Flowhttp://aws.amazon.com/articles/3938
In addition to Amazon EMR logging, you also have the option to generate detailed Hadoop logs. Hadoop logging must be enabled when a job flow is created and will use SimpleDB to store the logs.
For more information on Hadoop debugging, see the Amazon Elastic MapReduce Developer Guide.
Apache Hadoop is an open-source Java software framework that supports data processing of large data sets using server clusters.
For more information on the Hadoop framework, go to http://hadoop.apache.org/core/.
The following table lists related resources that you'll find useful as you work with this service.
| Resource | Description |
|---|---|
| Amazon Elastic MapReduce Getting Started Guide | This document. Provides a quick tutorial of the service based on a simple use case. Examples and instructions are included. |
| Amazon Elastic MapReduce Developer Guide | Provides conceptual information about Amazon EMR and describes how to use Amazon EMR features. |
| Amazon Elastic MapReduce API Reference | Contains a technical description of all Amazon EMR APIs. |
| Amazon Elastic MapReduce Quick Reference Card | Describes all of the command line parameters and their options. |
| Amazon EMR Technical FAQ | Covers the top questions developers have asked about this product. |
| Amazon EMR Release Notes | Gives a high-level overview of the current release, and notes any new features, corrections, and known issues. |
|
A central starting point to find documentation, code samples, release notes, and other information to help you build innovative applications with AWS. | |
|
Enables you to perform most of the functions of Amazon EMR and other AWS products without programming. | |
|
A community-based forum for developers to discuss technical questions related to Amazon Web Services. | |
|
The home page for AWS Technical Support, including access to our Developer Forums, Technical FAQs, Service Status page, and AWS Premium Support (if you are subscribed to this program). | |
|
The primary web page for information about AWS Premium Support, a one-on-one, fast-response support channel to help you build and run applications on AWS Infrastructure Services. | |
|
The primary web page for information about Amazon EMR. | |
|
Form for questions related to your AWS account: Contact Us |
This form is only for account questions. For technical questions, use the Discussion Forums. |
|
Detailed information about the copyright and trademark usage at Amazon.com and other topics. |
Please take a moment to provide feedback on this document by clicking the following button.
