Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

How to Use Log Files

Both Amazon Elastic MapReduce (Amazon EMR) and Hadoop produce log files, which describe the completion status of every step and task within a job flow. Amazon EMR groups the log files from all of the Amazon EC2 instances into one location that you specify in the LogUri parameter in the RunJobFlow operation.

Log File Directories

When you look in Amazon S3 at the bucket you specified with the LogUri parameter you find folders labeled with job IDs. Within each folder is a folder labeled Steps, and within that folder is a folder for each of the steps in the job flow. Each step folder contains a link to a variety of log files named syslog, stdout, controller, and stderr. Hadoop generates the files logged in syslog and Amazon EMR generates the files logged in stdout and stderr, as shown in the following example.

Task Logs: 'task_200807301447_0001_m_000000_0'

stdout logs
map: key = test
map: key = test2

stderr logs

syslog logs
2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-07-30 14:51:16,507 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 1
2008-07-30 14:51:17,120 INFO org.apache.hadoop.mapred.TaskRunner: Task
'task_200807301447_0001_m_000000_0' done.

Example Log Files

This section contains samples of some of the log files you might inspect. For more information about where in the AWS Management Console you access these log files, see Debugging.

Steps Stderr Example

The following example comes from the stderr link on the Steps panel.

 Streaming Command Failed!	

Steps Syslog Example

The following example comes from the syslog link on the Steps panel. These logs correspond to the Stderr, Streaming Command Failed!

 2010-01-19 23:27:26,529 WARN org.apache.hadoop.mapred.JobClient (main): Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
 2010-01-19 23:27:30,143 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process: 12
 2010-01-19 23:27:30,397 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process: 12
 2010-01-19 23:27:31,092 INFO org.apache.hadoop.streaming.StreamJob (main): getLocalDirs(): [/mnt/var/lib/hadoop/mapred]
 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): Running job: job_201001192327_0001
 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run:
 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job  -Dmapred.job.tracker=domU-12-31-39-0C-24-54.compute-1.internal:9001 -kill   job_201001192327_0001
 2010-01-19 23:27:31,094 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://domU-12-31-39-0C-24-54.compute-1.internal:9100/jobdetails.jsp?jobid=job_201001192327_0001
 2010-01-19 23:27:32,105 INFO org.apache.hadoop.streaming.StreamJob (main):  map 0%  reduce 0%
 2010-01-19 23:27:53,908 INFO org.apache.hadoop.streaming.StreamJob (main):  map 5%  reduce 0%
 2010-01-19 23:27:54,917 INFO org.apache.hadoop.streaming.StreamJob (main):  map 8%  reduce 0%
 2010-01-19 23:28:08,121 INFO org.apache.hadoop.streaming.StreamJob (main):  map 15%  reduce 0%
 2010-01-19 23:28:10,169 INFO org.apache.hadoop.streaming.StreamJob (main):  map 17%  reduce 3%
 2010-01-19 23:28:22,040 INFO org.apache.hadoop.streaming.StreamJob (main):  map 17%  reduce 6%
 2010-01-19 23:28:26,107 INFO org.apache.hadoop.streaming.StreamJob (main):  map 24%  reduce 6%
 2010-01-19 23:28:28,371 INFO org.apache.hadoop.streaming.StreamJob (main):  map 25%  reduce 6%
 2010-01-19 23:28:33,432 INFO org.apache.hadoop.streaming.StreamJob (main):  map 100%  reduce 100%
 2010-01-19 23:28:33,434 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run:
 2010-01-19 23:28:33,434 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job  -Dmapred.job.tracker=domU-12-31-39-0C-24-54.compute-1.internal:9001 -kill job_201001192327_0001
 2010-01-19 23:28:33,435 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://domU-12-31-39-0C-24-54.compute-1.internal:9100/jobdetails.jsp?jobid=job_201001192327_0001
 2010-01-19 23:28:33,435 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not Successful!
 2010-01-19 23:28:33,435 INFO org.apache.hadoop.streaming.StreamJob (main): killJob...

Task Attempt Stderr Example

Entries in the following example contain error messages from Hadoop and the Mapper script. The first error message is a stack trace from the Ruby script, where it threw an exception while processing input. The second error message (prepended by log4j ) is a warning from Hadoop stating that it failed to find appenders. The first message explains why the script failed. The second is a benign message from Hadoop about initializing the logging subsystem.

 /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:17: Invalid input, refusing to proceed after receiving "work" (RuntimeError)
   from /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:12:in `each'
   from /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:12
 log4j:WARN No appenders could be found for logger (org.apache.hadoop.streaming.PipeMapRed).
 log4j:WARN Please initialize the log4j system properly.

Task Attempt Syslog Example

The following syslog comes from a job flow where the data submitted to the mapper was in the wrong format.

2010-01-19 23:59:56,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics (main): Initializing JVM Metrics with processName=MAP, sessionId=
2010-01-19 23:59:56,846 INFO org.apache.hadoop.mapred.MapTask (main): Host name: domU-12-31-39-03-7D-E1.compute-1.internal
2010-01-19 23:59:56,848 INFO org.apache.hadoop.mapred.MapTask (main): numReduceTasks: 1
2010-01-19 23:59:56,867 INFO org.apache.hadoop.mapred.MapTask (main): io.sort.mb = 150
2010-01-19 23:59:57,873 INFO org.apache.hadoop.mapred.MapTask (main): data buffer = 119537664/149422080
2010-01-19 23:59:57,873 INFO org.apache.hadoop.mapred.MapTask (main): record buffer = 393216/491520
2010-01-19 23:59:59,380 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Opening 's3n://elasticmapreduce/samples/wordcount/input/0009' for reading
2010-01-19 23:59:59,574 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed exec [/mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192358_0001/attempt_201001192358_0001_m_000000_2/work/./wrong_format_wordcount.rb]
2010-01-19 23:59:59,744 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2010-01-19 23:59:59,744 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2010-01-19 23:59:59,747 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
2010-01-19 23:59:59,757 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2010-01-20 00:00:00,536 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10000/0/0 in:NA [rec/s] out:NA [rec/s]
2010-01-20 00:00:04,235 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-6): Records R/W=90635/1
2010-01-20 00:00:04,359 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-5): MRErrorThread done
2010-01-20 00:00:04,425 INFO org.apache.hadoop.streaming.PipeMapRed (main): mapRedFinished
2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): Starting flush of map output
2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): bufstart = 0; bufend = 69475; bufvoid = 149422080
2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): kvstart = 0; kvend = 7221; length = 491520
2010-01-20 00:00:04,828 WARN org.apache.hadoop.mapred.TaskTracker (main): Error running child
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
  at java.lang.String.substring(String.java:1938)
  at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorCombiner.reduce(ValueAggregatorCombiner.java:55)
  at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorCombiner.reduce(ValueAggregatorCombiner.java:34)
  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:921)
  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:802)
  at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:715)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:233)
  at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)

The lines starting with "at" show that the combiner is trying to parse records output by the mapper but the records are in the wrong format.

How to Troubleshoot Using Log Files

Debugging errors in large, distributed applications is difficult. Amazon EMR makes it easier by collecting the log files from the cluster and storing them in a location you specify on Amazon S3. If you do not specify a log URI in the RunJobFlow request, Amazon EMR does not collect logs.

[Important]Important

In this section, all relative Amazon S3 paths should be prefixed with your log URI and <JobFlowID> to get the actual log locations.

Log Files

The Amazon EMR job flow provides a JAR or streaming file and initiates the Hadoop application on your Amazon EC2 instances. Both Amazon EMR and Hadoop produce log files, which describe the completion status of every step and task within a job flow. Amazon EMR groups the log files from all of the cluster nodes into one location that you specify in the LogUri parameter in the RunJobFlow action.

Log File Directories

When you look in Amazon S3 at the bucket you specified with the LogUri parameter you find folders labeled with job IDs. Within each folder is a folder labeled Steps, and within that folder is a folder for each of the steps in the job flow. Each step folder contains a link to a variety of log files named syslog, stdout, controller, and stderr. Hadoop generates the files logged in syslog and Amazon EMR generates the files logged in stdout and stderr, as shown in the following example.

Task Logs: 'task_200807301447_0001_m_000000_0'

stdout logs
map: key = test
map: key = test2

stderr logs

syslog logs
2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2008-07-30 14:51:16,507 INFO org.apache.hadoop.mapred.MapTask:
numReduceTasks: 1
2008-07-30 14:51:17,120 INFO org.apache.hadoop.mapred.TaskRunner: Task
'task_200807301447_0001_m_000000_0' done.

How to Check Step Log Files

If you provide a custom JAR file and there is a failure, the first things to check are the step log files. Amazon EMR uploads these log files to steps/<step number>/ every few minutes. Each step creates the following four logs:

  • controller—Contains files generated by Amazon EMR that arise from errors encountered while trying to run your step

    If your step fails while loading, you can find the stack trace in this log.

  • syslog—Contains logs from non-Amazon software, such as Apache and Hadoop

  • stdout—Contains status generated by your mapper and reducer executables

  • stderr—Contains your step's standard error messages

To debug a job flow using step log files

  1. Use ssh as the Hadoop user on to the master node using the PEM file from the master node key pair to find the log files associated with the failed step.

    $ ssh -i mykey.pem hadoop@ec2-67-202-20-49.compute-1.amazonaws.com
  2. Use cat to view the log files.

    The following example looks into the syslog files. You can use the same procedure with any of the other three logs, controller, stdout, and stderr.

    $ cat /mnt/var/log/hadoop/steps/1/syslog
       2009-03-25 18:43:27,145 WARN org.apache.hadoop.mapred.JobClient (main): Use GenericOptionsParser for parsing 
       the arguments. Applications should implement Tool for the same.
       2009-03-25 18:43:28,828 ERROR org.apache.hadoop.streaming.StreamJob (main): Error Launching job : unknown 
       host: examples
    $ exit

    This error from Hadoop indicates that it was trying to look for a host called examples. If we look back at our request we see that the output path was set to hdfs://examples/output. This is actually incorrect because we want Hadoop to access the local HDFS system with the path /examples/output. We instead need to specify hdfs:///examples/output.

  3. Specify the output of the streaming job on the command line and submit another step to the job flow.

    ./elastic-mapreduce --jobflow j-36U2JMAE73054 --stream --output hdfs:///examples/output
  4. List the job flows to see if it completes.

    $ ./elastic-mapreduce --list -n 5 
       j-36U2JMAE73054     WAITING        ec2-67-202-20-49.compute-1.amazonaws.com     Example job flow
          FAILED         Example Streaming Step        
          COMPLETED      Example Streaming Step 

    This time the job succeeded. We can run the job again but this time send the output to a bucket in Amazon S3.

  5. Create a bucket in Amazon S3.

    Buckets in Amazon S3 are unique so choose a unique name for your bucket. The following example uses s3cmd. For more information about creating buckets, see the AWS Amazon Elastic MapReduce Getting Started Guide.

    $ s3cmd mb s3://myawsbucket
      Bucket s3://myawsbucket/ created

    s3cmd requires you to specify Amazon S3 paths using the prefix s3://.Amazon EMR requires the prefix s3n:// for files in stored in Amazon S3.

  6. Add a step to the job flow to send output to this bucket.

    $ ./elastic-mapreduce -j j-36U2JMAE73054 --stream --output s3n://my-example-bucket/output/1
      Added steps to j-36U2JMAE73054

    The protocol of the output URL is s3n. This tells Hadoop to use the Amazon S3 Native File System for the output location. The host part of the URL is the bucket and this is followed by the path.

  7. Terminate the job flow.

    $ ./elastic-mapreduce -j j-36U2JMAE73054 --terminate 
  8. Confirm that the job flow is shutting down.

    $ ./elastic-mapreduce --list -n 5

There are other options that you can specify when creating and adding steps to job flows. Use the --help option to find out what they are.

How to View Job Flow Logs

The section describes the methods available for viewing job flow logs.

How to View Logs Using the Command Line Interface

Using the command line interface (CLI) it is possible to run job flows that execute multiple steps. This is useful for developing multi-step streaming jobs and for debugging job flows. Using the CLI you can construct a development job flow that continues to run until terminated by the user. This is useful for debugging when a step fails because you can add another step to your active job flow rather restart the job flow.

Have a look at the status of the job flow. You can see if the job flow is started or whether the cluster nodes are starting up.

After the job flow transitions into either WAITING or RUNNING you can log onto the master node. You can get the master node from the detail pane in the Amazon EMR console or by listing active job flows on the command line:

$ ./elastic-mapreduce --list --active

With the DNS name of the master node you can SSH on to the master node of the Hadoop cluster as Hadoop user using your Amazon EC2 key pair. As in the following command, substitute the PEM file from your own key pair and the public DNS name of the master node:

PROMPT> ssh -i mykey.pem hadoop@ec2-01-001-001-1.compute-1.amazonaws.com

If you receive an error then you might not have set the permissions on the PEM file as described, the PEM file might be specified incorrectly, or you might not have copied the DNS name correctly.

After you log onto the master node, you can inspect the log files. If you specified a log URI then log files are automatically save to your Amazon S3 bucket. There is a delay of 5 minutes between the time the log files complete their writes and when they are saved to your Amazon S3 bucket. Often it is quicker to see results by viewing the logs directly on the cluster than waiting for the saved files to appear in your bucket. The directory on the cluster node to look in is: ls /mnt/var/log/hadoop/steps/1 .

This directory contains log files for the first step. The second step is in /mnt/var/log/hadoop/steps/2 and so on. The log files are:

  • controller—this is the log file of the process that attempts to execute your step

  • syslog—this is a log output by Hadoop which describes the execution of your Hadoop job by the job flow step

  • stderr—this is the stderr channel of Hadoop's attempt to execute you job flow

  • stdout—this is the stdout channel of Hadoop's attempt to execute you job flow

These files do not appear until the step runs for some time, finishes, or fails. You can also access the Hadoop UI when you are logged onto the master node using SSH and logging in as the Hadoop user by entering the following from the Hadoop command line:

$ lynx http://localhost:9100/

How to Download Job Flow Logs from Amazon S3

Instead of viewing logs on the master node, you can download the logs from a bucket on Amazon S3. You can download the data in a bucket using the Amazon S3 Organizer plug-in for Firefox. To download the plug-in, go to https://addons.mozilla.org/en-US/firefox/addon/3247.

To download log files from Amazon S3

  1. Locate the value you supplied for LogURI in the Amazon EMR console, CLI, or RunJobFlow request.

    The LogURI is a path to a bucket on Amazon S3 of the form s3n://[bucketName]/[path].

  2. Download the logs in the bucket using the Amazon S3 Organizer plug-in with Firefox, or the Amazon S3 GET Bucket operation.

    Amazon S3 downloads the log files in the bucket.

How to View Logs Using SSH

You can set up an SSH tunnel between your host and the master node where you can look on the file system for log files or at the job flow statistics published by the Hadoop web server. The master node in the cluster contains summary information of all of the work done by the slave nodes. You can, however, explore the working and error logs on each slave node in an effort to resolve problems occurring in the execution of the job flow.

Elastic MapReduce starts your instances in two security groups: one for the master node and another for the core node and task nodes. The master security group opens a port for communication with the service. It also opens the SSH port to allow you to connect via SSH as the Hadoop user directly on to the cluster nodes, using the proper credentials. The core and task nodes start in a separate security group that only allows interaction with the master node. Because these security groups are associated with your account, you can reconfigure them using the standard Amazon EC2 interfaces.

You need the DNS of the master node to log in to inspect the log files. This section explains how to identify the DNS of the master node.

To determine the DNS of a master node

  • Use the --list option as follows:

    For Linux/UNIX, enter:

    $ ./elastic-mapreduce --list --jobflow JobFlowID

    For Microsoft Windows, enter:

    $ ruby elastic-mapreduce --list --jobflow JobFlowID

In the response, the third column lists the DNS name of the master node if that node is currently running.

If you do not know the job flow ID, use the --list parameter with the --active parameter to list all active job flows.

To determine the DNS without a job flow ID

  1. Locate the value you supplied for LogURI in the Amazon EMR console, CLI, or RunJobFlow request.

    The LogURI is a path to a bucket on Amazon S3 of the form s3n://[bucketName]/[path].

  2. Download the logs in the bucket using the Amazon S3 Organizer plug-in with Firefox, or the Amazon S3 GET Bucket operation.

    Amazon S3 downloads the log files in the bucket.

How to View Bootstrap Action Log Files

Bootstrap action log files can help you identify and diagnose the results of your bootstrap actions. These log files are on the master node of your Hadoop cluster.

To view bootstrap action log files

  1. Connect to the master node.

    For more details, refer to How to Monitor Hadoop on a Master Node

  2. From the command prompt, change to the bootstrap action log directory.

    $ cd /mnt/var/log/bootstrap-actions/

    Log files for each bootstrap action are located in a subdirectory. The subdirectory name is based on the order of the bootstrap actions. For example, 1 for the first action, 2 for the second action, and so forth.

    [Note]Note

    The bootstrap action logs are also saved to your LogURI if you specify one.

    LogURI/JobFlowID/node/NodeID/bootstrap-actions/ActionNumber
  3. Change to the folder for the bootstrap action log file you want to view. For example, to access the first bootstrap action log file enter the following:

    $ cd 1
  4. View the contents of the log file.

    $ cat stderr

    The contents of the log file should look similar to the following:

    --2010-02-20 01:26:24--  http://.elasticmapreduce.s3.amazonaws.com/samples/bootstrap-actions/file.tar.gz
    Resolving elasticmapreduce.s3.amazonaws.com... 72.21.211.147
    Connecting to elasticmapreduce.s3.amazonaws.com|72.21.211.147|:80... connected.
    HTTP request sent, awaiting response...
    HTTP/1.1 200 OK
    x-amz-id-2: W4ggqTgUmMWylfuQksXgBi5hxrgmiHp8LYrZH184CXpUW+s1/jfOvKmhoG/NIFJz
    x-amz-request-id: D673608F6C6114D2
    Date: Sat, 20 Feb 2010 01:26:25 GMT
    x-amz-meta-s3fox-filesize: 153
    x-amz-meta-s3fox-modifiedtime: 1256233644776
    Last-Modified: Thu, 22 Oct 2009 17:47:44 GMT
    ETag: "47a007dae0ff192c166764259246388c"
    Content-Type: application/gzip
    Content-Length: 153
    Connection: Keep-Alive
    Server: AmazonS3
    Length: 153 [application/gzip]
    Saving to: `file.tar.gz'
    
    0K                                                       100% 24.3M=0s
    
    2010-02-20 01:26:24 (24.3 MB/s) - `file.tar.gz' saved [153/153]