Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Amazon EMR Logging

There are two types of logs that store information about your job flow: step-level logs generated by Amazon Elastic MapReduce (Amazon EMR) and Hadoop job logs generated by Apache Hadoop. You need to examine both log types to have complete information about your job flow.

Amazon EMR step-level logs contain information about the job flow and the results of each step. These logs are useful when you are debugging problems that you encounter initializing and running the job flow. For example, a step-level log contains status information such as Streaming Command Failed!.

Hadoop logs contain information about Hadoop jobs, tasks, and task attempts. They are the standard log files generated by Apache Hadoop.

The following image shows the relationship between Amazon EMR job flow steps and Hadoop jobs, tasks, and task attempts.

Both step-level logs and Hadoop logs are generated by default and stored on the master node of the job flow. You can access them while the job flow is running by using SSH to connect to the master node as described in How to View Logs Using SSH. When the job flow ends the master node is terminated and you will no longer be able to access those logs using SSH. To be able to access the log files of a terminated job flow, you can direct Amazon EMR to copy the step-level and Hadoop log files to an Amazon S3 bucket as described in How to Enable Logging and Debugging.

If you specify that the log files are to be copied to an Amazon S3 bucket, you have the option to have Amazon EMR create an index over those log files to generate debugging information and reports. This index is stored in Amazon SimpleDB and can be accessed by clicking the Debug button in the Amazon EMR console.

The options to copy log files to Amazon S3 and to create a debugging index in SimpleDB can only be initialized when the job flow is launched. You cannot add them to an already running job flow.

When you are building your job flow, we recommend that you enable debugging on a small but representative subset of your data.