Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

How to Enable Logging and Debugging

The following topics describe how to initialize logging and debugging on your job flow. You can use the Amazon EMR console, the CLI, or the API to enable logging and debugging.

Enable Logging and Debugging Using the Amazon EMR Console

This section describes how to configure logging and debugging from the Amazon EMR console.

To initialize logging and debugging for a job flow

  1. Sign in to the AWS Management Console and open the Amazon Elastic MapReduce console at https://console.aws.amazon.com/elasticmapreduce/.

  2. Click the Create New Job Flow button to launch a new job flow. Then follow the instructions in the wizard. For more information about launching a job flow, see Creating a Job Flow.

  3. On the ADVANCED OPTIONS pane, enter a value for Amazon S3 Log Path that indicates where you want Amazon EMR to copy the log files.

  4. For Enable Debugging click Yes. Amazon EMR creates an index of the log files in SimpleDB.

When you enable debugging the Debug button in the Amazon EMR console displays debugging information. This display links to the log files after Amazon EMR uploads the log files to your bucket on Amazon S3. It takes a few minutes for the log file uploads to complete after the step completes. So, if the links are pending in the console display, the log files are not yet uploaded.

Amazon MapReduce periodically updates the status of Hadoop jobs, tasks, and task attempts. You can use Refresh List in the debugging panes to get the most up-to-date status of these items.

Enable Logging and Debugging Using the CLI

You can enable logging when you create a job flow by setting two options:

  • --log-uri pathToLogFilesOnAmazonS3

    This provides a storage location for log files. You must create an Amazon S3 bucket and specify a log-uri in the command to create the job flow or set the parameter in the credentials.json file.

  • --enable-debugging

    This turns on debugging which uses SimpleDB to store and access an index of the logs. Specify the log-uri for step-level logging.

Enable Logging and Debugging Using the API

To use the debugging functionality with the API, you must enable debugging when creating a job flow. It is not possible to enable logging after a job flow is created.

  • While the job flow is running you can access the Hadoop web interface.

    To enable Hadoop step level debugging, you must add the following job flow step, for example:

    Action=RunJobFlow&
    Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&
    Instances.Ec2KeyName=MyKeyName&
    Steps.member.1.Name=Setup%20Hadoop%20Debugging&
    LogUri=s3%3A%2F%2FYourBucket%2Flogs%2F&
    Signature=calculated value&
    Instances.SlaveInstanceType=m1.small&
    Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29&
    AWSAccessKeyId=calculated value&
    Instances.MasterInstanceType=m1.small&
    Instances.InstanceCount=1&
    Timestamp=2010-05-26T11%3A25%3A40-07%3A00&
    SignatureVersion=2&
    Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fstate-pusher%2F0.1%2Ffetch&
    SignatureMethod=HmacSHA1&
    Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&
    Instances.KeepJobFlowAliveWhenNoSteps=true            

  • If you specified the a logURI when the job flow was created you can download the log files from your job flow after they are copied to a bucket on Amazon S3.

    To enable this option you must specify a logURI parameter, for example:

    https://elasticmapreduce.amazonaws.com/?Action=RunJobFlow&
    Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&
    Instances.Ec2KeyName=MyKeyName&
    Steps.member.1.Name=Setup%20Hive&
    LogUri=s3%3A%2F%2FYourBucekt%2Flogs%2F&
    Signature=calculated value&
    Instances.SlaveInstanceType=instanceType&
    Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29&
    AWSAccessKeyId=calculated value&
    Instances.MasterInstanceType=instanceType&
    Instances.InstanceCount=COUNT&
    Timestamp=2010-05-26T11%3A29%3A20-07%3A00&
    SignatureVersion=VERSION&
    Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2Fhive-script&
    SignatureMethod=HmacSHA1&
    Steps.member.1.HadoopJarStep.Args.member.2=--base-path&
    Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fhive%2F&
    Steps.member.1.HadoopJarStep.Args.member.4=--install-hive&
    Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&
    Instances.KeepJobFlowAliveWhenNoSteps=BOOL            

  • You can enable Hadoop debugging and use the Amazon EMR console to inspect the progress of your job flow and access log files from your job flow located on Amazon S3.

    To enable Hadoop debugging and use the Amazon EMR console, you must run a special step at the beginning of your job flow in addition to specifying a logURI, for example:

    Action=RunJobFlow&
    Steps.member.1.ActionOnFailure=TERMINATE_JOB_FLOW&
    Instances.Ec2KeyName=MyKeyName&
    Steps.member.1.Name=Setup%20Pig&
    LogUri=s3%3A%2F%2FYourBucekt%2Flogs%2F&
    Signature=calculated value&
    Instances.SlaveInstanceType=m1.small&
    Name=Development%20Job%20Flow%20%20%28requires%20manual%20termination%29&
    AWSAccessKeyId=calculated value&
    Instances.MasterInstanceType=m1.small&
    Instances.InstanceCount=1&
    Timestamp=2010-05-26T11%3A31%3A31-07%3A00&
    SignatureVersion=2&
    Steps.member.1.HadoopJarStep.Args.member.1=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2Fpig-script&
    Steps.member.1.HadoopJarStep.Args.member.2=--base-path&
    Steps.member.1.HadoopJarStep.Args.member.3=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fpig%2F&
    Steps.member.1.HadoopJarStep.Args.member.4=--install-pig&
    SignatureMethod=HmacSHA1&
    Steps.member.1.HadoopJarStep.Jar=s3%3A%2F%2Fus-east-1.elasticmapreduce%2Flibs%2Fscript-runner%2Fscript-runner.jar&
    Instances.KeepJobFlowAliveWhenNoSteps=true