| Did this page help you? Yes No Tell us about it... |
Topics
Both Amazon Elastic MapReduce (Amazon EMR) and Hadoop produce log files, which describe the completion
status of every step and task within a job flow. Amazon EMR groups the log files
from all of the Amazon EC2 instances into one location that you specify in the
LogUri parameter in the RunJobFlow
operation.
When you look in Amazon S3 at the bucket you specified with the
LogUri parameter you find folders labeled with job IDs. Within
each folder is a folder labeled Steps, and within
that folder is a folder for each of the steps in the job flow. Each step folder contains
a link to a variety of log files named syslog,
stdout, controller, and
stderr. Hadoop generates the files logged in
syslog and Amazon EMR generates the files logged in
stdout and stderr, as shown in the
following example.
Task Logs: 'task_200807301447_0001_m_000000_0' stdout logs map: key = test map: key = test2 stderr logs syslog logs 2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-07-30 14:51:16,507 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2008-07-30 14:51:17,120 INFO org.apache.hadoop.mapred.TaskRunner: Task 'task_200807301447_0001_m_000000_0' done.
Topics
This section contains samples of some of the log files you might inspect. For more information about where in the AWS Management Console you access these log files, see Debugging.
The following example comes from the stderr link on the
Steps panel.
Streaming Command Failed!
The following example comes from the syslog link on the
Steps panel. These logs correspond to the Stderr,
Streaming Command Failed!
2010-01-19 23:27:26,529 WARN org.apache.hadoop.mapred.JobClient (main): Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2010-01-19 23:27:30,143 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process: 12 2010-01-19 23:27:30,397 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process: 12 2010-01-19 23:27:31,092 INFO org.apache.hadoop.streaming.StreamJob (main): getLocalDirs(): [/mnt/var/lib/hadoop/mapred] 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): Running job: job_201001192327_0001 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run: 2010-01-19 23:27:31,093 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job -Dmapred.job.tracker=domU-12-31-39-0C-24-54.compute-1.internal:9001 -kill job_201001192327_0001 2010-01-19 23:27:31,094 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://domU-12-31-39-0C-24-54.compute-1.internal:9100/jobdetails.jsp?jobid=job_201001192327_0001 2010-01-19 23:27:32,105 INFO org.apache.hadoop.streaming.StreamJob (main): map 0% reduce 0% 2010-01-19 23:27:53,908 INFO org.apache.hadoop.streaming.StreamJob (main): map 5% reduce 0% 2010-01-19 23:27:54,917 INFO org.apache.hadoop.streaming.StreamJob (main): map 8% reduce 0% 2010-01-19 23:28:08,121 INFO org.apache.hadoop.streaming.StreamJob (main): map 15% reduce 0% 2010-01-19 23:28:10,169 INFO org.apache.hadoop.streaming.StreamJob (main): map 17% reduce 3% 2010-01-19 23:28:22,040 INFO org.apache.hadoop.streaming.StreamJob (main): map 17% reduce 6% 2010-01-19 23:28:26,107 INFO org.apache.hadoop.streaming.StreamJob (main): map 24% reduce 6% 2010-01-19 23:28:28,371 INFO org.apache.hadoop.streaming.StreamJob (main): map 25% reduce 6% 2010-01-19 23:28:33,432 INFO org.apache.hadoop.streaming.StreamJob (main): map 100% reduce 100% 2010-01-19 23:28:33,434 INFO org.apache.hadoop.streaming.StreamJob (main): To kill this job, run: 2010-01-19 23:28:33,434 INFO org.apache.hadoop.streaming.StreamJob (main): UNDEF/bin/hadoop job -Dmapred.job.tracker=domU-12-31-39-0C-24-54.compute-1.internal:9001 -kill job_201001192327_0001 2010-01-19 23:28:33,435 INFO org.apache.hadoop.streaming.StreamJob (main): Tracking URL: http://domU-12-31-39-0C-24-54.compute-1.internal:9100/jobdetails.jsp?jobid=job_201001192327_0001 2010-01-19 23:28:33,435 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not Successful! 2010-01-19 23:28:33,435 INFO org.apache.hadoop.streaming.StreamJob (main): killJob...
Entries in the following example contain error messages from Hadoop and the Mapper script. The first error message is a stack trace from the Ruby script, where it threw an exception while processing input. The second error message (prepended by log4j ) is a warning from Hadoop stating that it failed to find appenders. The first message explains why the script failed. The second is a benign message from Hadoop about initializing the logging subsystem.
/mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:17: Invalid input, refusing to proceed after receiving "work" (RuntimeError) from /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:12:in `each' from /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192327_0001/attempt_201001192327_0001_m_000001_3/work/./raising_wordcount.rb:12 log4j:WARN No appenders could be found for logger (org.apache.hadoop.streaming.PipeMapRed). log4j:WARN Please initialize the log4j system properly.
The following syslog comes from a job flow where the data submitted to the mapper was in the wrong format.
2010-01-19 23:59:56,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics (main): Initializing JVM Metrics with processName=MAP, sessionId= 2010-01-19 23:59:56,846 INFO org.apache.hadoop.mapred.MapTask (main): Host name: domU-12-31-39-03-7D-E1.compute-1.internal 2010-01-19 23:59:56,848 INFO org.apache.hadoop.mapred.MapTask (main): numReduceTasks: 1 2010-01-19 23:59:56,867 INFO org.apache.hadoop.mapred.MapTask (main): io.sort.mb = 150 2010-01-19 23:59:57,873 INFO org.apache.hadoop.mapred.MapTask (main): data buffer = 119537664/149422080 2010-01-19 23:59:57,873 INFO org.apache.hadoop.mapred.MapTask (main): record buffer = 393216/491520 2010-01-19 23:59:59,380 INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem (main): Opening 's3n://elasticmapreduce/samples/wordcount/input/0009' for reading 2010-01-19 23:59:59,574 INFO org.apache.hadoop.streaming.PipeMapRed (main): PipeMapRed exec [/mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201001192358_0001/attempt_201001192358_0001_m_000000_2/work/./wrong_format_wordcount.rb] 2010-01-19 23:59:59,744 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s] 2010-01-19 23:59:59,744 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s] 2010-01-19 23:59:59,747 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s] 2010-01-19 23:59:59,757 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s] 2010-01-20 00:00:00,536 INFO org.apache.hadoop.streaming.PipeMapRed (main): R/W/S=10000/0/0 in:NA [rec/s] out:NA [rec/s] 2010-01-20 00:00:04,235 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-6): Records R/W=90635/1 2010-01-20 00:00:04,359 INFO org.apache.hadoop.streaming.PipeMapRed (Thread-5): MRErrorThread done 2010-01-20 00:00:04,425 INFO org.apache.hadoop.streaming.PipeMapRed (main): mapRedFinished 2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): Starting flush of map output 2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): bufstart = 0; bufend = 69475; bufvoid = 149422080 2010-01-20 00:00:04,425 INFO org.apache.hadoop.mapred.MapTask (main): kvstart = 0; kvend = 7221; length = 491520 2010-01-20 00:00:04,828 WARN org.apache.hadoop.mapred.TaskTracker (main): Error running child java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1938) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorCombiner.reduce(ValueAggregatorCombiner.java:55) at org.apache.hadoop.mapred.lib.aggregate.ValueAggregatorCombiner.reduce(ValueAggregatorCombiner.java:34) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:921) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:802) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:715) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:233) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
The lines starting with "at" show that the combiner is trying to parse records output by the mapper but the records are in the wrong format.
Debugging errors in large, distributed applications is difficult. Amazon EMR
makes it easier by collecting the log files from the cluster and storing them in a
location you specify on Amazon S3. If you do not specify a log URI in the
RunJobFlow request, Amazon EMR does not collect
logs.
![]() | Important |
|---|---|
In this section, all relative Amazon S3 paths should be prefixed with your log URI and <JobFlowID> to get the actual log locations. |
The Amazon EMR job flow provides a JAR or streaming file and initiates
the Hadoop application on your Amazon EC2 instances. Both Amazon EMR and
Hadoop produce log files, which describe the completion status of every step and task
within a job flow. Amazon EMR groups the log files from all of the cluster
nodes into one location that you specify in the LogUri
parameter in the RunJobFlow action.
When you look in Amazon S3 at the bucket you specified with the
LogUri parameter you find folders labeled with job IDs.
Within each folder is a folder labeled Steps,
and within that folder is a folder for each of the steps in the job flow. Each
step folder contains a link to a variety of log files named
syslog, stdout,
controller, and stderr. Hadoop
generates the files logged in syslog and Amazon EMR
generates the files logged in stdout and
stderr, as shown in the following example.
Task Logs: 'task_200807301447_0001_m_000000_0' stdout logs map: key = test map: key = test2 stderr logs syslog logs 2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId= 2008-07-30 14:51:16,507 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1 2008-07-30 14:51:17,120 INFO org.apache.hadoop.mapred.TaskRunner: Task 'task_200807301447_0001_m_000000_0' done.
If you provide a custom JAR file and there is a failure, the first things to check
are the step log files. Amazon EMR uploads these log files to
steps/<step number>/ every few minutes. Each step
creates the following four logs:
controller—Contains files generated by Amazon EMR that arise from errors encountered while trying to run your step
If your step fails while loading, you can find the stack trace in this log.
syslog—Contains logs from non-Amazon software, such as Apache and Hadoop
stdout—Contains status generated by your mapper and reducer executables
stderr—Contains your step's standard error messages
To debug a job flow using step log files
Use ssh as the Hadoop user on to the master node using the PEM file from the master node key pair to find the log files associated with the failed step.
$ ssh -imykey.pemhadoop@ec2-67-202-20-49.compute-1.amazonaws.com
Use cat to view the log files.
The following example looks into the syslog files. You can use the same procedure with any of the other three logs, controller, stdout, and stderr.
$ cat /mnt/var/log/hadoop/steps/1/syslog 2009-03-25 18:43:27,145 WARN org.apache.hadoop.mapred.JobClient (main): Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-03-25 18:43:28,828 ERROR org.apache.hadoop.streaming.StreamJob (main): Error Launching job : unknown host: examples $ exit
This error from Hadoop indicates that it was trying to look for a host
called examples. If we look back at our request we see that the output path was
set to hdfs://examples/output. This is actually incorrect
because we want Hadoop to access the local HDFS system with the path
/examples/output. We instead need to specify
hdfs:///examples/output.
Specify the output of the streaming job on the command line and submit another step to the job flow.
./elastic-mapreduce --jobflow j-36U2JMAE73054 --stream --output hdfs:///examples/output
List the job flows to see if it completes.
$ ./elastic-mapreduce --list -n 5
j-36U2JMAE73054 WAITING ec2-67-202-20-49.compute-1.amazonaws.com Example job flow
FAILED Example Streaming Step
COMPLETED Example Streaming Step This time the job succeeded. We can run the job again but this time send the output to a bucket in Amazon S3.
Create a bucket in Amazon S3.
Buckets in Amazon S3 are unique so choose a unique name for your bucket.
The following example uses s3cmd. For more information
about creating buckets, see the AWS Amazon Elastic MapReduce Getting Started
Guide.
$ s3cmd mb s3://myawsbucket Bucket s3://myawsbucket/ created
s3cmd requires you to specify Amazon S3 paths using
the prefix s3://.Amazon EMR requires the prefix
s3n:// for files in stored in Amazon S3.
Add a step to the job flow to send output to this bucket.
$ ./elastic-mapreduce -j j-36U2JMAE73054 --stream --output s3n://my-example-bucket/output/1 Added steps to j-36U2JMAE73054
The protocol of the output URL is s3n. This tells
Hadoop to use the Amazon S3 Native File System for the output location. The
host part of the URL is the bucket and this is followed by the path.
Terminate the job flow.
$ ./elastic-mapreduce -j j-36U2JMAE73054 --terminate
Confirm that the job flow is shutting down.
$ ./elastic-mapreduce --list -n 5
There are other options that you can specify when creating and adding steps to job
flows. Use the --help option to find out what they are.
Topics
The section describes the methods available for viewing job flow logs.
Using the command line interface (CLI) it is possible to run job flows that execute multiple steps. This is useful for developing multi-step streaming jobs and for debugging job flows. Using the CLI you can construct a development job flow that continues to run until terminated by the user. This is useful for debugging when a step fails because you can add another step to your active job flow rather restart the job flow.
Have a look at the status of the job flow. You can see if the job flow is started or whether the cluster nodes are starting up.
After the job flow transitions into either WAITING or
RUNNING you can log onto the master node. You can get the
master node from the detail pane in the Amazon EMR console or by listing active job flows on
the command line:
$ ./elastic-mapreduce --list --active
With the DNS name of the master node you can SSH on to the master node of the Hadoop cluster as Hadoop user using your Amazon EC2 key pair. As in the following command, substitute the PEM file from your own key pair and the public DNS name of the master node:
PROMPT> ssh -imykey.pemhadoop@ec2-01-001-001-1.compute-1.amazonaws.com
If you receive an error then you might not have set the permissions on the PEM file as described, the PEM file might be specified incorrectly, or you might not have copied the DNS name correctly.
After you log onto the master node, you can inspect the log files. If you
specified a log URI then log files are automatically save to your Amazon S3 bucket.
There is a delay of 5 minutes between the time the log files complete their writes
and when they are saved to your Amazon S3 bucket. Often it is quicker to see results
by viewing the logs directly on the cluster than waiting for the saved files to
appear in your bucket. The directory on the cluster node to look in is: ls
/mnt/var/log/hadoop/steps/1 .
This directory contains log files for the first step. The second step is in /mnt/var/log/hadoop/steps/2 and so on. The log files are:
controller—this is the log file of the process that attempts to execute your step
syslog—this is a log output by Hadoop which describes the execution of your Hadoop job by the job flow step
stderr—this is the stderr channel of Hadoop's attempt to execute you job flow
stdout—this is the stdout channel of Hadoop's attempt to execute you job flow
These files do not appear until the step runs for some time, finishes, or fails. You can also access the Hadoop UI when you are logged onto the master node using SSH and logging in as the Hadoop user by entering the following from the Hadoop command line:
$ lynx http://localhost:9100/
Instead of viewing logs on the master node, you can download the logs from a bucket on Amazon S3. You can download the data in a bucket using the Amazon S3 Organizer plug-in for Firefox. To download the plug-in, go to https://addons.mozilla.org/en-US/firefox/addon/3247.
To download log files from Amazon S3
Locate the value you supplied for LogURI in the
Amazon EMR console, CLI, or RunJobFlow request.
The LogURI is a path to a bucket on Amazon S3 of
the form s3n://[bucketName]/[path].
Download the logs in the bucket using the Amazon S3 Organizer plug-in with
Firefox, or the Amazon S3 GET Bucket operation.
Amazon S3 downloads the log files in the bucket.
You can set up an SSH tunnel between your host and the master node where you can look on the file system for log files or at the job flow statistics published by the Hadoop web server. The master node in the cluster contains summary information of all of the work done by the slave nodes. You can, however, explore the working and error logs on each slave node in an effort to resolve problems occurring in the execution of the job flow.
Elastic MapReduce starts your instances in two security groups: one for the master node and another for the core node and task nodes. The master security group opens a port for communication with the service. It also opens the SSH port to allow you to connect via SSH as the Hadoop user directly on to the cluster nodes, using the proper credentials. The core and task nodes start in a separate security group that only allows interaction with the master node. Because these security groups are associated with your account, you can reconfigure them using the standard Amazon EC2 interfaces.
You need the DNS of the master node to log in to inspect the log files. This section explains how to identify the DNS of the master node.
To determine the DNS of a master node
Use the --list option as follows:
For Linux/UNIX, enter:
$ ./elastic-mapreduce --list --jobflow JobFlowIDFor Microsoft Windows, enter:
$ ruby elastic-mapreduce --list --jobflow JobFlowIDIn the response, the third column lists the DNS name of the master node if that node is currently running.
If you do not know the job flow ID, use the --list
parameter with the --active parameter to list all active job
flows.
To determine the DNS without a job flow ID
Locate the value you supplied for LogURI in the
Amazon EMR console, CLI, or RunJobFlow request.
The LogURI is a path to a bucket on Amazon S3 of
the form s3n://[bucketName]/[path].
Download the logs in the bucket using the Amazon S3 Organizer plug-in with
Firefox, or the Amazon S3 GET Bucket operation.
Amazon S3 downloads the log files in the bucket.
Bootstrap action log files can help you identify and diagnose the results of your bootstrap actions. These log files are on the master node of your Hadoop cluster.
To view bootstrap action log files
Connect to the master node.
For more details, refer to How to Monitor Hadoop on a Master Node
From the command prompt, change to the bootstrap action log directory.
$ cd /mnt/var/log/bootstrap-actions/
Log files for each bootstrap action are located in a subdirectory. The
subdirectory name is based on the order of the bootstrap actions. For example,
1 for the first action, 2 for the second action, and
so forth.
![]() | Note |
|---|---|
The bootstrap action logs are also saved to your
|
Change to the folder for the bootstrap action log file you want to view. For example, to access the first bootstrap action log file enter the following:
$ cd 1
View the contents of the log file.
$ cat stderr
The contents of the log file should look similar to the following:
--2010-02-20 01:26:24-- http://.elasticmapreduce.s3.amazonaws.com/samples/bootstrap-actions/file.tar.gz Resolving elasticmapreduce.s3.amazonaws.com... 72.21.211.147 Connecting to elasticmapreduce.s3.amazonaws.com|72.21.211.147|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK x-amz-id-2: W4ggqTgUmMWylfuQksXgBi5hxrgmiHp8LYrZH184CXpUW+s1/jfOvKmhoG/NIFJz x-amz-request-id: D673608F6C6114D2 Date: Sat, 20 Feb 2010 01:26:25 GMT x-amz-meta-s3fox-filesize: 153 x-amz-meta-s3fox-modifiedtime: 1256233644776 Last-Modified: Thu, 22 Oct 2009 17:47:44 GMT ETag: "47a007dae0ff192c166764259246388c" Content-Type: application/gzip Content-Length: 153 Connection: Keep-Alive Server: AmazonS3 Length: 153 [application/gzip] Saving to: `file.tar.gz' 0K 100% 24.3M=0s 2010-02-20 01:26:24 (24.3 MB/s) - `file.tar.gz' saved [153/153]