| Did this page help you? Yes No Tell us about it... |
Topics
When you write an application that calls the Amazon Elastic MapReduce (Amazon EMR) API, there are several concepts that apply regardless of whether you are calling the API directly using a Query request or are calling one of the wrapper functions of an SDK.
An endpoint is a URL that is the entry point for a web service. Every web service request, whether it originates from a Query or a call to an SDK function, must contain an endpoint. The endpoint specifies the AWS Region where job flows are created, described, or terminated. It has the form
elasticmapreduce.. If the Region name is not specified, Amazon EMR uses the default Region, us-east-1.regionname.amazonaws.com
For a list of the endpoints for Amazon EMR, go to Regions and Endpoints documentation.
The Instances parameters enable you to configure the type and
number of Amazon EC2 instances to create nodes to process the data. Hadoop spreads the
processing of the data across multiple job flow nodes. The master node is responsible for
keeping track of the health of the core and task nodes and polling the nodes for job
result status. The core and task nodes do the actual processing of the data. If you have
a single-node job flow, the node serves as both the master and a core node.
The KeepJobAlive parameter in a
RunJobFlow request determines whether to terminate the cluster
when it runs out of job flow steps to execute. Set this value to False when
you know that the job flow is running as expected. When you are troubleshooting the job
flow and adding steps while the job flow execution is suspended, set the value to
True. This reduces the amount of time and expense of uploading the
results to Amazon Simple Storage Service (Amazon S3), only to repeat the process after modifying a step to restart the
job flow.
If KeepJobAlive is true, after successfully
getting the job flow to complete its work, you must send a
TerminateJobFlows request or the job flow continues to run and
generate AWS charges.
For more information about parameters that are unique to
RunJobFlow, see RunJobFlow. For more information about the generic parameters in the
request, see Common Request Parameters.
Amazon EMR uses Amazon Elastic Compute Cloud (Amazon EC2) instances as nodes to process job flows. These Amazon EC2 instances have locations composed of Availability Zones (AZ) and Regions. Regions are dispersed and located in separate geographic areas. Availability Zones are distinct locations within a Region insulated from failures in other Availability Zones. Each Availability Zone provides inexpensive, low-latency network connectivity to other Availability Zones in the same Region. Currently, Amazon EMR is available in the following Regions:
US East (Northern Virginia)—To call this Region, use the endpoint
elasticmapreduce.us-east-1.amazonaws.com.
US West (Oregon)—To call this Region, use the endpoint
elasticmapreduce.us-west-2.amazonaws.com.
US West (Northern California)—To call this Region, use the endpoint
elasticmapreduce.us-west-1.amazonaws.com.
EU (Ireland)—To call this
Region, use the endpoint
elasticmapreduce.eu-west-1.amazonaws.com.
Asia Pacific (Singapore)—To call this
Region, use the endpoint
elasticmapreduce.ap-southeast-1.amazonaws.com.
Asia Pacific (Tokyo)—To call this
Region, use the endpoint
elasticmapreduce.ap-northeast-1.amazonaws.com.
South America (Sao Paulo)—To call this
Region, use the endpoint
elasticmapreduce.sa-east-1.amazonaws.com.
The AvailabilityZone parameter specifies the general location
of the job flow. This parameter is optional and, in general, we discourage its use. When
AvailabilityZone is not specified Amazon EMR automatically picks the
best AvailabilityZone for the job flow. You might find this
parameter useful if you want to colocate your instances with other existing running
instances, and your job flow needs to read or write data from those instances. For more
information, go to the Amazon Elastic Compute Cloud Developer Guide.
There are times when you might like to use additional files or custom libraries with your mapper or reducer applications. For example, you might like to use a library that converts a PDF file into plain text.
AWS provides tutorials that show you how to create complete applications, including:
For more Amazon EMR code examples, go to Sample Code & Libraries.