| Did this page help you? Yes No Tell us about it... |
Topics
This section covers the basics of creating a job flow using Amazon Elastic MapReduce (Amazon EMR). You can create a job flow using the Amazon EMR console, downloading and installing the Command Line Interface (CLI), or creating a query request with the Query API. The interface-specific details for using either the Amazon EMR console, the CLI, or the API are covered in the following sections.
For information about creating the objects and setting the permissions needed to create a job flow see Setting Up Your Environment to Run a Job Flow. For information on the job flow process and how individual steps are processed see Job Flows and Steps.
Choose one of the supported job flow types: your choice of job flow type depends on several factors, including the format of the data and your level of programming knowledge. For information on comparing the supported job flow types, see Appendix: Compare Job Flow Types.
Choose the manner in which you want to create your job flow. The description of each job flow type in this section includes details on how to create a job flow using the Amazon EMR console, the CLI, or Query API. The Amazon EMR console provides a graphical interface to launch Elastic MapReduce job flows and monitor their progress. The CLI combines full compatibility with the Elastic MapReduce API without requiring a programming environment. The Elastic MapReduce API, AWS SDK, and libraries offer the most flexibility, but require a programming environment and software development skills.
You need to plan the job flow you want to run and specify where Amazon EMR finds the information. Typically, the MapReduce program or script is located in a bucket on Amazon S3. Your job flow input, output, and job flow logs are also typically located on Amazon S3.
Required Amazon S3 buckets must exist before you can create a job flow. You must upload any required scripts or data referenced in the job flow to Amazon S3. The following table describes example data, scripts, and log file locations.
| Information | Example Location on Amazon S3 |
|---|---|
| script or program |
s3://myawsbucket/wordcount/wordSplitter.py
|
| log files |
s3://myawsbucket/wordcount/logs
|
| input data |
s3://myawsbucket/wordcount/input
|
| output data |
s3://myawsbucket/wordcount/output
|
For information on how to upload objects to Amazon S3, go to Add an Object to Your Bucket in the Amazon Simple Storage Service Getting Started Guide.