| Did this page help you? Yes No Tell us about it... |
Use the Amazon EMR console to access the log files at the different step and Hadoop job execution levels. You can use these logs to debug your applications.
Before you can use the debugging functionality in the console, you must enable debugging when you create a job flow. For more information, see How to Enable Logging and Debugging.
The following procedure shows you how to debug a job flow using the Amazon EMR console.
To debug a failed job flow
In the Amazon EMR console, click the check box next to the failed job flow you want to debug and click Debug.

![]() | Note |
|---|---|
By default, the list is sorted alphabetically by the Name column. To sort the results based on another column, click the column title once (for ascending) or twice (for descending order). |
The Steps pane displays the steps in the selected job flow.
Each row provides links pointing to Hadoop logs generated as part of each step. If the links are labeled (log not uploaded yet), click Refresh List.
Click one of the following links in the Log Files column in
the row marked FAILED:
controller—Contains files generated by Amazon Elastic MapReduce (Amazon EMR) that arise from errors encountered while trying to run your step
If your step fails while loading, you can find the stack trace in this log. Errors loading or accessing your application are often described here. Missing mapper file errors are often described here.
stderr—Contains your step's standard error messages
Application loading errors are often described here. Sometimes contains stack trace.
stdout—Contains status generated by your mapper and reducer executables
Application loading errors are often described here. Sometimes contains application error messages.
syslog—Contains logs from non-Amazon software, such as Apache and Hadoop
Streaming errors are often described here.
If you can't resolve the problem by looking at the these log files, click View All Tasks for All Jobs.
This action skips over the Jobs pane, which does not associate links to log files.
The Tasks pane displays the Hadoop tasks in the jobs.

Time elapsed during a task is a good indication of trouble; the longer the elapsed time, the more likelihood of trouble.
To easily see the time elapsed in a task, click the Elapsed Time column title to sort the results by elapsed time.
On the Tasks pane, click View Attempts for the task that failed.
The Task Attempts pane displays the task attempts in the selected task.
On the Task Attempts pane, click one or more of the links in the Log Files column for the task attempt that failed:
stderr—Contains task attempt error messages
stdout—Contains task attempt output logs
syslog—Contains logs generated by Hadoop.
Topics
This section describes how you can troubleshoot your job flows using the log files produced by Hadoop and Amazon EMR.
Amazon EMR allows you to create a job flow containing no steps. The effect
is to create a Hadoop cluster and then stop processing. You can add additional steps
using --AddJobFlowSteps. As soon as you issue that request,
Amazon EMR continues the job flow and you can see whether or not the step
completed successfully.
To develop and debug a job flow starting without steps
In a RunJobFlow request, set
KeepJobFlowAliveWhenNoSteps to true and
ActionOnFailure to
CANCEL_AND_WAIT.
CANCEL_AND_WAIT stops job flow execution but does not
terminate the Hadoop cluster. The default value, TERMINATE, stops
the job flow and terminates the cluster. CANCEL_AND_WAIT enables
you to revise your jars or add steps and retry the job flow without incurring
the expense of downloading the data from Amazon S3 to Amazon EC2.
Send the RunJobFlow request.
If you want to see the Hadoop system, ssh as Hadoop user into the master node.
ssh –i [keyfile] hadoop@[EC2_master_node_DNS]
In a AddJobFlowSteps request, set
ActionOnFailure to
CANCEL_AND_WAIT.
Send the AddJobFlowSteps request.
Inspect the log files using a tool like Amazon S3 Organizer to see if there were errors.
Using this procedure, you can work on a step to make sure it completes successfully before adding the next step. For more information about adding steps, go to Add Steps to a Job Flow.
When you are ready for production, set
KeepJobFlowAliveWhenNoSteps to false and
ActionOnFailure to TERMINATE_JOB_FLOW.
This value automatically terminates the Hadoop cluster after running the job flow.
![]() | Note |
|---|---|
When you use the console to run a job flow, the value of
|
You might want to debug a job flow with steps.
To develop and debug a job flow with steps
In a RunJobFlow request, set
ActionOnFailure to
CANCEL_AND_WAIT.
This value stops job flow execution but does not terminate the Hadoop
cluster. The default value, TERMINATE, stops the job flow and
terminates the cluster. CANCEL_AND_WAIT enables you to revise your
JAR files or add steps and retry the job flow without incurring the expense of
downloading the data from Amazon S3 to Amazon EC2.
Send the RunJobFlow request.
Inspect the log files using a tool like Amazon S3 Organizer to see if there were errors.
Change the step that caused the error and resubmit the step using
AddJobFlowStep setting, in the request,
ActionOnFailure to
CANCEL_AND_WAIT.
If your JAR file successfully started or you created a streaming job, the next
place to look for failures is in the task attempts. The Map and Reduce functions you
wrote execute in the context of a task. Tasks can execute multiple times as "task
attempts" because of failures or speculative execution. Amazon EMR uploads
task attempt logs into task-attempts/.
If one of the tasks failed, you can look at the task logs to determine what
happened. These files are also available on the nodes under
/mnt/var/log/hadoop/userlogs/. Looking through log files on each node
in the cluster, however, makes this way of debugging difficult.
Task-attempt log files are similar in format to the step log files.
In rare cases, Hadoop itself might fail. To see if that is the case, you must look at the Hadoop daemon logs.
To view the daemon log files
Look under /mnt/var/log/hadoop/ on each node or under
daemons/<instance id>/ on Amazon S3.
![]() | Note |
|---|---|
Not all cluster nodes run all daemons. |
When developing your application, we recommend that you enable both types of debugging: step and Hadoop job level and run a small but representative subset of your data to make sure your application works. To enable step level debugging, select Yes for Enable Debugging and enter an Amazon S3 bucket URI in the Amazon S3 Log Path field.
When when a node fails to come up, Amazon EMR stops attempting to contact
the node and put the associated instance group into a failed state. After some time,
the failed node causes the instance group to change to an ARRESTED
state.
A node could fail to come up if:
Hadoop or the cluster is somehow broken and does not accept a new node into the cluster
A bootstrap action fails on the new node
The node is not functioning correctly and fails to check in with Hadoop
If an instance group is in the ARRESTED state, and the job flow is
in a WAITING state, you can add a job flow step to reset the desired
number of slave nodes. Adding the step resumes processing of the job flow and put the
instance group back into a RUNNING state.
For details on how to reset a job flow in an arrested state, refer to Arrested State.
The following sections describe common errors for each job flow type.
The following table describes common errors for custom JAR job flows.
| Error | Where to Look |
|---|---|
| General | You can usually find the cause of a custom JAR error in the
syslog file. Link to it from the
Steps pane. If you can't determine the problem
there, check in the Hadoop task attempt error message, which you link to
from the Task Attempts pane. |
| JAR throws exception before creating a job | If the main program of your custom JAR throws an exception while
creating the Hadoop job, the best place to look is the
syslog file. Link to it from the
Steps pane. |
| JAR throws an error inside a map task | If your custom JAR and mapper throw an exception while processing
input data, the best place to look is the syslog
file. Link to it on the Task Attempts pane. |
The following table describes common errors for Hive or Pig job flows.
| Error | Where to Look |
|---|---|
| General | You can usually find the cause of a Hive or Pig error in the
syslog file, which you link to from the
Steps pane. If you can't determine the problem
there, check in the Hadoop task attempt error message. Link to it on the
Task Attempts pane. |
| Syntax or semantic error in the Hive script | If a step fails, look at the stdout file (which
you link to from the Steps pane) of the step that
ran the Hive script. If the error is not there, look is the
syslog file. Link to it on the Task
Attempts pane. |
| Job fails when running interactively` | If you are running Hive interactively on the master node and the job
flow failed, select the syslog. Link to it on the
Task Attempts pane for the task in the
interactive step that failed. |
The following table describes common errors for streaming job flows.
| Error | Where to Look |
|---|---|
| General | You can usually find the cause of a streaming error in a
syslog file. Link to it on the
Steps pane. |
| Data sent to the mapper in the wrong format | You can find the error message in the syslog
file of a failed task attempt. Link to it on the Task
Attempts pane. |
| Misconfigured time limit | Your mapper or reducer script does not produce output within the
configured time limit (600 seconds, by default). Find the error in the
syslog of the failed task attempt. You can change
the time limit by passing an extra arg: -jobconf
mapred.task.timeout=800000. This is the number of milliseconds
before Amazon EMR terminates a task if it neither reads an
input, writes an output, or updates its status string. |
| Exit with error | Your mapper or reducer script exits with an error. Find the error in
the stderr file of the failed task attempt. |