AWS Documentation
Developer Guide (API Version 2009-03-31)
Search:
Entire Site
AMIs
Articles & Tutorials
AWS Product Information
Case Studies
Customer Apps
Developer Tools
Documentation
Public Data Sets
Release Notes
Solution Providers
Sample Code & Libraries
Welcome
Introduction to Amazon EMR
Overview of Amazon EMR
Architectural Overview of Amazon EMR
Elastic MapReduce Features
Amazon EMR Concepts
Job Flows and Steps
Hadoop and MapReduce
Associated AWS Product Concepts
Using Amazon EMR
Setting Up Your Environment to Run a Job Flow
Creating a Job Flow
How to Create a Streaming Job Flow
How to Create a Job Flow Using Hive
How to Create a Job Flow Using Pig
How to Create a Job Flow Using a Custom JAR
How to Create a Cascading Job Flow
View Job Flow Details
Terminate a Job Flow
Customize a Job Flow
Add Steps to a Job Flow
Add More than 256 Steps to a Job Flow
Bootstrap Actions
Resizing Running Job Flows
Calling Additional Files and Libraries
Using Distributed Cache
Running a Script in a Job Flow
Use Cases
Cascading
Pig
Streaming
Building Binaries Using Amazon EMR
Using Tagging
Protecting a Job Flow from Termination
Lowering Costs with Spot Instances
Choosing What to Launch as Spot Instances
Spot Instance Pricing in Amazon EMR
Availability Zones and Regions
Launching Spot Instances in Job Flows
Changing the Number of Spot Instances in a Job Flow
Troubleshooting Spot Instances
Troubleshooting
Things to Check When Your Amazon EMR Job Flow Fails
Amazon EMR Logging
How to Enable Logging and Debugging
How to Use Log Files
How to Monitor Hadoop on a Master Node
How to Use the Hadoop User Interface
Troubleshooting Tips
Using Ganglia
Initializing Ganglia on a Job Flow
Viewing Ganglia Metrics
Ganglia Reports
Hadoop Metrics in Ganglia
Distributed Copy Using S3DistCp
Exporting, Importing, Querying, and Joining Tables in Amazon DynamoDB Using Amazon EMR
Prerequisites for Integrating Amazon Elastic MapReduce
Step 1: Create a Key Pair
Step 2: Create a Job Flow
Step 3: SSH into the Master Node
Step 4: Set Up a Hive Table to Run Hive Commands
Hive Command Examples for Exporting, Importing, and Querying Data
Optimizing Performance
Monitoring Job Flow Metrics
Using Karmasphere Analytics
Calling the Amazon EMR API
Common Concepts for API Calls
Using SDKs to Call Amazon EMR APIs
Using the AWS SDK for Java to Create an Amazon EMR Job Flow
Generating a Query Request Using AWS Ruby Gems
Using the Java SDK to Sign a Query Request
Using Query Requests to Call Amazon EMR APIs
Why Query Requests Are Signed
Components of a Query Request in Amazon EMR
How to Generate a Signature for a Query Request in Amazon EMR
Environment Configuration
Configuring User Permissions
Using Elastic IP Addresses
Specifying the Amazon EMR AMI Version
Hadoop Configuration
Supported Hadoop Versions
Configuration of hadoop-user-env.sh
Upgrading to Hadoop 0.20
Hadoop Version Behavior
Hadoop 0.20 Streaming Configuration
Hadoop Default Configuration (AMI 1.0)
Hadoop Configuration (AMI 1.0)
HDFS Configuration (AMI 1.0)
Task Configuration (AMI 1.0)
Intermediate Compression (AMI 1.0)
Hadoop Default Configuration (AMI 2.0)
Hadoop Configuration (AMI 2.0)
HDFS Configuration (AMI 2.0)
Task Configuration (AMI 2.0)
Intermediate Compression (AMI 2.0)
Hadoop Memory-Intensive Configuration Settings
File System Configuration
JSON Configuration Files
Multipart Upload
Hadoop Data Compression
Hadoop 0.20.205 Patches
Configuring Hive
Performance Tuning
Running Job Flows on an Amazon VPC
Appendix: Compare Job Flow Types
Appendix: Amazon EMR Resources
Document History
Glossary
Index
AWS Documentation
»
Amazon EMR Documentation
»
Amazon EMR Documentation
»
Did this page help you?
Yes
No
Tell us about it...
Index
bzip2,
How to Process Compressed Files
gzip,
How to Process Compressed Files
LZO,
How to Process Compressed Files
A
add step to job flow,
Add Steps to a Job Flow
additional libraries,
How to Use Additional Files and Libraries in Amazon EMR Job Flows
Amazon EC2,
Amazon EC2 Concepts
Amazon EC2 instance types,
Amazon EC2 Instances
Amazon EC2 Instances,
Amazon EC2 Instances
Amazon EMR concepts,
Amazon EMR Concepts
Amazon S3,
Amazon S3 Concepts
Amazon S3 buckets,
Buckets
Amazon S3 native file system,
Supported File Systems
,
Buckets
,
Data Storage
API Requests
SDK,
Calling the Amazon EMR API
architectural diagram,
Architectural Overview of Amazon EMR
architectural overview,
Architectural Overview of Amazon EMR
Args,
API
,
API
arrested job flow,
Arrested State
arrested state,
Arrested State
availability zones,
Regions
,
Availability Zones in Amazon EMR
AWS concepts,
Associated AWS Product Concepts
B
bootstrap actions,
Bootstrap Actions
,
Bootstrap Actions
custom,
Running Custom Bootstrap Actions from the CLI
,
Running Custom Bootstrap Actions from the Amazon EMR Console
predefined,
Using Predefined Bootstrap Actions
bucket names,
Create and Configure an Amazon S3 Bucket
buckets,
Buckets
C
cascading,
How to Create a Cascading Job Flow
Cascading,
How to Create a Cascading Job Flow
,
Appendix: Compare Job Flow Types
cluster nodes,
Instance Groups
cluster tuning,
Performance Tuning
command
--active,
View Job Flow Details
,
CLI
--alive,
View Job Flow Details
--ami-version,
Specifying the Amazon EMR AMI Version
--bootstrap-action,
Running Custom Bootstrap Actions from the CLI
--create,
Creating a Job Flow
,
How to Create a Streaming Job Flow
,
How to Create a Job Flow Using Hive
,
How to Create a Job Flow Using Pig
,
How to Create a Job Flow Using a Custom JAR
,
How to Create a Cascading Job Flow
--details,
View Job Flow Details
--hadoop-version,
Supported Pig Versions
,
Supported Hadoop Versions
,
Supported Hive Versions
--hive-script,
How to Create a Job Flow Using Hive
--hive-site,
Creating a Metastore Outside the Hadoop Cluster
--hive-versions,
Supported Hive Versions
--list,
View Job Flow Details
,
CLI
--pig-script,
How to Create a Job Flow Using Pig
--steps,
Add Steps to a Job Flow
--stream,
How to Create a Streaming Job Flow
--terminate,
Terminate a Job Flow
-d,
Interactive and Batch Modes
concepts, AWS,
Associated AWS Product Concepts
configuration,
Bootstrap Actions
configure Hadoop,
Configuration of hadoop-user-env.sh
configure hadoop-user-env.sh,
Configuration of hadoop-user-env.sh
core node,
Instance Groups
create job flow,
Creating a Job Flow
custom JAR,
How to Create a Job Flow Using a Custom JAR
customer support,
Appendix: Amazon EMR Resources
D
data compression
intermediate,
Hadoop Data Compression
output,
Hadoop Data Compression
data security,
Secure Data
data storage,
Configurable Data Storage
,
Supported File Systems
debug,
How to Debug Job Flows with Steps
debug using log files,
How to Troubleshoot Using Log Files
debugging
hadoop,
Hadoop and Step Logging
step,
Hadoop and Step Logging
debugging job flow,
Troubleshooting
describe job flow,
View Job Flow Details
distributed cache,
API
document history,
Document History
download log files,
How to View Job Flow Logs
,
How to Download Job Flow Logs from Amazon S3
E
endpoints
Europe,
Endpoints for Amazon EMR
North America,
Endpoints for Amazon EMR
F
failures,
Checking Hadoop Failures
features,
Overview of Amazon EMR
file systems,
Supported File Systems
,
Data Storage
FoxyProxy,
How to Install Foxy Proxy
G
generating signatures,
Using the Java SDK to Sign a Query Request
H
Hadoop,
Hadoop and MapReduce
,
Using the Hadoop Stream Utility
data compression,
Hadoop Data Compression
failures,
Checking Hadoop Failures
process,
Using the Hadoop Stream Utility
user interface,
How to Use the Hadoop User Interface
,
How to Install Foxy Proxy
Hadoop 0.20.205
patches,
Hadoop 0.20.205 Patches
Hadoop configuration,
JSON Configuration Files
hadoop debugging,
Hadoop and Step Logging
HDFS,
Supported File Systems
Hive,
Hive Support
,
How to Create a Job Flow Using Hive
,
Appendix: Compare Job Flow Types
batch,
Interactive and Batch Modes
data sharing,
Sharing Data Between Hive 0.5 and Hive 0.7
interactive,
Interactive and Batch Modes
versioning,
Displaying the Hive Version
Hive version
0.4,
Configuring Hive
0.5,
Configuring Hive
0.7,
Configuring Hive
I
IAM,
Configuring User Permissions
instance types,
Amazon EC2 Instances
interfaces
comparison,
Using Amazon EMR
J
job flow,
Multiple Sequential Steps
,
Job Flows and Steps
,
Creating a Job Flow Using Pig
add steps,
Add Steps to a Job Flow
cascading,
Supports Hadoop Methods
create,
Creating a Job Flow
,
Using the Hadoop Stream Utility
custom JAR,
Supports Hadoop Methods
debug,
How to Debug Job Flows with No Steps
job flow with steps,
How to Debug Job Flows with Steps
job flow without steps,
How to Debug Job Flows with No Steps
details,
View Job Flow Details
download logs from Amazon S3,
How to View Job Flow Logs
,
How to Download Job Flow Logs from Amazon S3
Hive,
Supports Hadoop Methods
,
Running Hive in Interactive Mode
list,
View Job Flow Details
monitoring,
How to View Logs Using SSH
,
Troubleshooting Job Flows
Pig,
Supports Hadoop Methods
resizing,
Resizeable Running Job Flows
states,
View Job Flow Details
status using SSH,
How to Monitor Hadoop on a Master Node
streaming,
Supports Hadoop Methods
terminate,
Terminate a Job Flow
job flow process
example,
Overview of Amazon EMR
JSON files,
JSON Configuration Files
K
key pair,
Amazon EC2 Key Pairs
L
libraries, additional,
How to Use Additional Files and Libraries in Amazon EMR Job Flows
list job flows,
View Job Flow Details
log files,
Troubleshooting
,
How to Use Log Files
,
Log Files
,
How to Process Compressed Files
directories,
Log File Directories
,
Log File Directories
download from Amazon S3,
How to View Job Flow Logs
,
How to Download Job Flow Logs from Amazon S3
step,
How to Check Step Log Files
used to debug,
How to Troubleshoot Using Log Files
M
MapReduce,
What Is MapReduce?
MapReduce process,
What Is MapReduce?
master node,
Instance Groups
monitor job flow,
How to Monitor Hadoop on a Master Node
monitoring
job flows,
How to View Logs Using SSH
,
Troubleshooting Job Flows
multipart upload,
Multipart Upload
,
Multipart Upload
N
new features,
Document History
node,
Instance Groups
P
performance tuning,
Performance Tuning
Pig,
How to Create a Job Flow Using Pig
,
Appendix: Compare Job Flow Types
Pig 0.9.1
patches,
Pig 0.9.1 Patches
policies,
Configuring User Permissions
examples,
Example Policies for Amazon EMR
predefined bootstrap actions,
Using Predefined Bootstrap Actions
ConfigureDaemons,
Configure Daemons
ConfigureHadoop,
Configure Hadoop
Memory-Intensive,
Configure Memory-Intensive Workloads
RunIf,
Run If
Shutdown,
Shutdown Actions
Q
Query,
Components of a Query Request in Amazon EMR
R
regions,
Regions
related resources,
Appendix: Amazon EMR Resources
,
Amazon EMR Documentation
Reserved Instances,
Reserved Instances
S
S3N (see Amazon S3 native file system)
security,
Secure Data
service overview,
Overview of Amazon EMR
signature, generating ,
Using the Java SDK to Sign a Query Request
SimpleDB,
Data Storage
SSH,
Secure Data
,
How to View Logs Using SSH
,
How to Monitor Hadoop on a Master Node
,
How to Install Foxy Proxy
Hive,
Running Hive in Interactive Mode
interactive mode,
Interactive and Batch Modes
into master node,
Instance Groups
step
log files,
How to Check Step Log Files
states,
View Job Flow Details
streaming,
How to Create a Streaming Job Flow
utility,
Using the Hadoop Stream Utility
Streaming,
Appendix: Compare Job Flow Types
T
task,
How to Use the Hadoop User Interface
,
Troubleshooting Task Attempts
task node,
Instance Groups
terminate job flow,
Terminate a Job Flow
troubleshooting,
Troubleshooting
U
updates,
Document History
user interface
Hadoop,
How to Use the Hadoop User Interface
V
version,
Document History
W
what's new,
Document History
Z
zones, availability,
Availability Zones in Amazon EMR
Javascript is disabled or is unavailable in your browser.
To use the AWS Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.