| Did this page help you? Yes No Tell us about it... |
This section walks you through how to set up required resources and permissions to run a job flow. The tasks that follow show you how to create the resources that your job flow uses to process data. Once created, you can reuse these resources for other job flows. Depending on your application, however, it may make operational sense to create new resources for each job flow.
The tasks that must be completed before you create a job flow are as follows:
| 1 | Choose a Region |
| 2 | Create and Configure an Amazon S3 Bucket |
| 3 | Create an Amazon EC2 Key Pair and PEM File |
| 4 | Modify Your PEM File |
| 5 | For CLI and API users only, Get Security Credentials |
| 6 | For CLI users only, optionally Create a Credentials File |
The following sections provide instructions on how to perform each of the tasks.
AWS enables you to place resources in multiple locations. Locations are composed of Regions and Availability Zones within those Regions. Availability Zones are distinct geographical locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region.
All Amazon EC2 Instances, key pairs, security groups, and Amazon Elastic MapReduce (Amazon EMR) job flows must be located in the same Region. To optimize performance and reduce latency, all resources (such as Amazon S3 buckets) and job flows should be located in the same Availability Zone.
For more information about Regions and Availability Zones, go to Using Regions and Availability Zones in the Amazon Elastic Compute Cloud User Guide
![]() | Note |
|---|---|
Not all AWS products offer the same support in all Regions. For example, Cluster Compute instances are available only in the US-East (Northern Virginia) Region. Confirm that you are working in the appropriate Region for the resources you want to use. |
You must ensure that you use the same Region for each resource you create. Use the table below to identify the correct Region name.
| If your Amazon EMR Region is... | The Amazon EMR CLI and API Region is... | The Amazon S3 Region is... | The Amazon EC2 Region is... |
|---|---|---|---|
| US East (Virginia) | us-east-1 | US Standard | US East (Virginia) |
| US West (Oregon) | us-west-2 | Oregon | US West (Oregon) |
| US West (N. California) | us-west-1 | Northern California | US West (N. California) |
| EU West (Ireland) | eu-west-1 | Ireland | EU West (Ireland) |
| Asia Pacific (Singapore) | ap-southeast-1 | Singapore | Asia Pacific (Singapore) |
| Asia Pacific (Tokyo) | ap-northeast-1 | Tokyo | Asia Pacific (Tokyo) |
| South America (Sao Paulo) | sa-east-1 | Sao Paulo | South America (Sao Paulo) |
Specify the Region with the --region
parameter, as in the following example. If the --region
parameter is not specified, the job flow is created in the us-east-1 region.
$ ./elastic-mapreduce --create --alive --stream --input myawsbucket \
--output myawsbucket --log-uri --region eu-west-1![]() | Tip |
|---|---|
To reduce the number of parameters required each time you issue a command from the
CLI, you can store information such as Region in your
|
To select a region, configure your application to use that Region's
endpoint. If you are creating a client application using an AWS SDK, you can change
the client endpoint by calling setEndpoint, as shown in the following
example:
client.setEndpoint(“eu-west-1.elasticmapreduce.amazonaws.com”);
Once your application has specified a region by setting the endpoint, you can set the
Availability Zone for your job flow's Amazon EC2 instances with a query request that
contains a Instances.Placement.AvailabilityZone parameter, as in the
following example. If you do not specify the Availability Zone for your job flow,
Amazon EMR launches the job flow instances in the best Availability Zone in that region
based on system health and available capacity.
https://elasticmapreduce.amazonaws.com?
Operation=
...
Instances.Placement.AvailabilityZone=eu-west-1a&
...For more information about the parameters in an Amazon EMR request, see API Reference.
![]() | Note |
|---|---|
For more information on specifying Regions from the CLI and API, see Available Region Endpoints for the AWS SDKs . |
Amazon Elastic MapReduce (Amazon EMR) uses Amazon S3 to store input data, log files, and output data. Amazon S3 refers to these storage locations as buckets. To conform with Amazon S3 requirements, DNS requirements, and restrictions in the supported data analysis tools, we recommend following the following guidelines for bucket names. All bucket names must:
Be between 3 and 63 characters long
Contain only lowercase letters, numbers, or periods (.)
Not contain a dash (-) or underscore (_)
For additional details on valid bucket names, go to Bucket Restrictions and Limitations in the Amazon Simple Storage Service Developers Guide.
This section shows you how to use the AWS Management Console to create and then set permissions for an Amazon S3 bucket. However, you can also create and set permissions for an Amazon S3 bucket using the Amazon S3 API or the third-party Curl command line tool. For information about Curl, go to Amazon S3 Authentication Tool for Curl. For information about using the Amazon S3 API to create and configure an Amazon S3 bucket, go to the Amazon Simple Storage Service API Reference.
To create an Amazon S3 bucket
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
Click Create Bucket.
The Create a Bucket dialog box opens.
Enter a bucket name, such as mylog-uri.
This name should be globally unique, and cannot be the same name used by another bucket.
Select the Region for your bucket. To avoid paying cross-region bandwidth charges, create the Amazon S3 bucket in the same region as your job flow.
Refer to Choose a Region for guidance on choosing a Region.
Click Create.
You created a bucket with the URI s3n://mylog-uri/.
![]() | Note |
|---|---|
If you enable logging in the Create a Bucket wizard, it enables only bucket access logs, not Amazon EMR job flow logs. |
![]() | Note |
|---|---|
For more information on specifying Region-specific buckets, refer to Buckets and Regions in the Amazon Simple Storage Service Developer Guide and Available Region Endpoints for the AWS SDKs . |
After you create your bucket you can set the appropriate permissions on it. Typically, you give yourself (the owner) read and write access and authenticated users read access.
To set permissions on an Amazon S3 bucket
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.
In the Buckets pane, right-click the bucket you just created.
Select Properties.
In the Properties pane, select the Permissions tab.
Click Add more permissions.
Select Authenticated Users in the Grantee field.
To the right of the Grantee drop-down list, select List.
Click Save.
You have created a bucket and restricted permissions to authenticated users.
Amazon EMR uses an Amazon Elastic Compute Cloud (Amazon EC2) key pair to ensure
that you alone have access to the instances that you launch. The PEM
file associated with this key pair is required to ssh directly to the master
node of the cluster running your job flow.
To create an Amazon EC2 key pair
Sign in to the AWS Management Console and open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
From the Amazon EC2 console, select a Region.
In the Navigation pane, click Key Pairs.
On the Key Pairs page, click Create Key Pair.
In the Create Key Pair dialog box, enter a name for your key
pair, such as, mykeypair.
Click Create.
Save the resulting PEM file in a safe location.
Your Amazon EC2 key pair and an associated PEM file are created.
Amazon Elastic MapReduce (Amazon EMR) enables you to work interactively with your job flow, allowing you to
test job flow steps or troubleshoot your cluster environment. To log in directly to the
master node of your running job flow, you can use ssh or PuTTY. You use your
PEM file to authenticate to the master node. The
PEM file requires a modification based on the tool you use that
supports your operating system. You use the CLI to connect on Linux or UNIX computers. You
use PuTTY to connect on Microsoft Windows computers. For more information on how to install
the Amazon EMR CLI or how to install PuTTY, go to the Getting Started Guide.
To modify your credentials file
Create a local permissions file:
| If you are using... | Do this... |
|---|---|
| Linux or UNIX |
Set the permissions on the $ chmod og-rwx mykeypair.pem |
| Microsoft Windows |
|
Your credentials file is modified to allow you to log in directly to the master node of your running job flow.
AWS assigns you an Access Key ID and a Secret Access Key to identify you as the sender of your request. AWS uses these security credentials to help protect your data. You include your Access Key ID in all AWS requests made through the CLI or API. The AWS Management Console provides these security credentials automatically.
![]() | Note |
|---|---|
Your Secret Access Key is a shared secret between you and AWS. Keep this key secret. Amazon uses this key to bill you for the AWS services you use. Never include your key in your requests to AWS and never email your key to anyone, even if an inquiry appears to originate from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your Secret Access Key. |
To get your Access Key ID and Secret Access Key
Go to the AWS website.
Click My Account to display a list of options.
Click Security Credentials and log in to your AWS Account. Your Access Key ID is displayed in the Access Credentials section. Your Secret Access Key remains hidden as a further precaution.
To display your Secret Access Key, click Show in the Your Secret Access Key area, as shown in the following figure.

You have your Access Key ID and a Secret Access Key to securely identify yourself to AWS. You need this information to create a credentials file, as described in the following section.
You can use an Amazon EMR credentials file to simplify job flow creation and authentication of requests. The credentials file provides information required for many commands. The credentials file is a convenient place for you to store command parameters so you don't have to repeatedly enter the information.
Your credentials are used to calculate the signature value for every request you make.
The Amazon EMR CLI automatically looks for these credentials in the file
credentials.json. you can edit the credentials.json file and
include your AWS credentials. If you do not have a credentials.json file, you
must include your credentials in every request you make.
To create your credentials file
Create a file named credentials.json on your computer.
Add the following lines to your credentials file:
{
"access-id": "AccessKeyID",
"private-key": "PrivateKey",
"key-pair": "KeyName",
"key-pair-file": "location of key pair file",
"region": "Region",
"log-uri": "location of bucket on Amazon S3"
}The access-id and private-key are the AWS
Access Key ID and a Secret Access Key described in Get Security Credentials. The key-pair and
key-pair-file are the Amazon EC2 key pair and the path and name of
PEM file you created in Create an Amazon EC2 Key Pair and PEM File.
The region is the Region you selected in Choose a Region. The log-uri is the path to the
bucket you created in Create and Configure an Amazon S3 Bucket using the
format s3n://BucketName/FolderName.
Your credentials.json file is configured.
Each of the preceding tasks guided you through the steps to set up the objects and permissions required for a job flow. You are now ready to create your job flow. Instructions on how to create a job flow are at Creating a Job Flow.