| Did this page help you? Yes No Tell us about it... |
Topics
The following topics describe how to initialize logging and debugging on your job flow. You can use the Amazon EMR console, the CLI, or the API to enable logging and debugging.
This section describes how to configure logging and debugging from the Amazon EMR console.
To initialize logging and debugging for a job flow
Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.
Click the Create New Job Flow button to launch a new job flow. Then follow the instructions in the wizard. For more information about launching a job flow, see Creating a Job Flow.
On the ADVANCED OPTIONS pane, enter a value for Amazon S3 Log Path that indicates where you want Amazon EMR to copy the log files.

For Enable Debugging click Yes. Amazon EMR creates an index of the log files in SimpleDB.
When you enable debugging the Debug button in the Amazon EMR console displays debugging information. This display links to the log files after Amazon EMR uploads the log files to your bucket on Amazon S3. It takes a few minutes for the log file uploads to complete after the step completes. So, if the links are pending in the console display, the log files are not yet uploaded.
Amazon MapReduce periodically updates the status of Hadoop jobs, tasks, and task attempts. You can use Refresh List in the debugging panes to get the most up-to-date status of these items.
You can enable logging when you create a job flow by setting two options:
--log-uri
pathToLogFilesOnAmazonS3
This provides a storage location for log files. You must create an Amazon S3
bucket and specify a log-uri in the command to create the
job flow or set the parameter in the credentials.json
file.
--enable-debugging
This turns on debugging which uses SimpleDB to store and
access an index of the logs. Specify the log-uri for step-level
logging.
To use the debugging functionality with the API, you must enable debugging when creating a job flow. It is not possible to enable logging after a job flow is created.
While the job flow is running you can access the Hadoop web interface.
To enable Hadoop step level debugging, you must add the following job flow step, for example:
Action=RunJobFlow& Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW& Instances.Ec2KeyName=MyKeyName& Steps.member.1.Name=Setup%20Hadoop%20Debugging& LogUri=s3%3A%2F%2FYourBucket%2Flogs%2F& Signature=calculated value& Instances.SlaveInstanceType=m1.small& Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29& AWSAccessKeyId=calculated value& Instances.MasterInstanceType=m1.small& Instances.InstanceCount=1& Timestamp=2010-05-26T11%3A25%3A40-07%3A00& SignatureVersion=2& Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fstate-pusher%2F0.1%2Ffetch& SignatureMethod=HmacSHA1& Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar& Instances.KeepJobFlowAliveWhenNoSteps=true
If you specified the a logURI when the job flow was created you can download the log files from your job flow after they are copied to a bucket on Amazon S3.
To enable this option you must specify a logURI parameter, for
example:
https://elasticmapreduce.amazonaws.com/?Action=RunJobFlow& Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW& Instances.Ec2KeyName=MyKeyName& Steps.member.1.Name=Setup%20Hive& LogUri=s3%3A%2F%2FYourBucekt%2Flogs%2F& Signature=calculated value& Instances.SlaveInstanceType=instanceType& Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29& AWSAccessKeyId=calculated value& Instances.MasterInstanceType=instanceType& Instances.InstanceCount=COUNT& Timestamp=2010-05-26T11%3A29%3A20-07%3A00& SignatureVersion=VERSION& Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2Fhive-script& SignatureMethod=HmacSHA1& Steps.member.1.HadoopJarStep.Args.member.2=--base-path& Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2F& Steps.member.1.HadoopJarStep.Args.member.4=--install-hive& Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar& Instances.KeepJobFlowAliveWhenNoSteps=BOOL
You can enable Hadoop debugging and use the Amazon EMR console to inspect the progress of your job flow and access log files from your job flow located on Amazon S3.
To enable Hadoop debugging and use the Amazon EMR console, you must run a special step at the beginning of your job flow in addition to specifying a logURI, for example:
Action=RunJobFlow& Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW& Instances.Ec2KeyName=MyKeyName& Steps.member.1.Name=Setup%20Pig& LogUri=s3%3A%2F%2FYourBucekt%2Flogs%2F& Signature=calculated value& Instances.SlaveInstanceType=m1.small& Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29& AWSAccessKeyId=calculated value& Instances.MasterInstanceType=m1.small& Instances.InstanceCount=1& Timestamp=2010-05-26T11%3A31%3A31-07%3A00& SignatureVersion=2& Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script& Steps.member.1.HadoopJarStep.Args.member.2=--base-path& Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F& Steps.member.1.HadoopJarStep.Args.member.4=--install-pig& SignatureMethod=HmacSHA1& Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar& Instances.KeepJobFlowAliveWhenNoSteps=true