Developer Guide (API Version 2009-03-31)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

How to Monitor Hadoop on a Master Node

To log on to the master node you can use use:

For Linux/UNIX enter:

$ ./elastic-mapreduce --jobflow JobFlowID --ssh

For Microsoft Windows enter:

$ ruby elastic-mapreduce --jobflow JobFlowID --ssh

Alternatively, you can use SSH port forwarding to set up a secure link between your computer and the master node in the Hadoop cluster that is processing your job flow. To SSH as Hadoop user into the master node, your job flow status must be either WAITING or RUNNING. To make the job flow remain in a WAITING state even after successful completion, use the --alive parameter in the CreateJobFlow action. For more information, see Creating a Job Flow.

To view the Amazon EMR logs on the EC2 master node

  1. Open an SSH shell and use an SSH command of the following form to set up an SSH connection as the Hadoop user between your host and the EC2 master node.

    ssh –i keyfile.pem hadoop@EC2MasterNodeDNS

    Substitute the PEM file from your own key pair and the public DNS name of the master node. The following is an example for myKeyPairName.pem at ec2-67-202-49-73.

    ssh -i ~/ec2-keys/myKeyPairName.pem hadoop@ec2-67-202-49-73.compute-1.amazonaws.com

    For keyfile, use the value you set for:

    • SSH Key Name in the console

    • Ec2KeyName in a CLI

    • key in a RunJobFlow request

    The key name provides a handle to the master node and enables you to log into it with account Hadoop without using a password. You cannot SSH into the master node if you did not set a value for SSH Key Name or Ec2KeyName.

    For the EC2_master_node_DNS, use the value returned for it in the Amazon EMR console or from DescribeJobFlows. You always log in as Hadoop.

    [Note]Note

    As an alternative to SSH, you can use a utility, such as PuTTY. For more information about how to install PuTTY and use it to connect to an EC2 instance, such as the master node, go to Appendix D: Connecting to a Linux/UNIX Instance from Windows using PuTTY in the Amazon Elastic Compute Cloud User Guide.

    If you receive an error running the ssh command, you might not have set the permissions on the PEM file, the PEM file might be specified incorrectly, or you copied the DNS name incorrectly.

  2. Navigate to /mnt/var/log/hadoop/steps/1 to see the logs on the master node for the first step. The second step log files are in /mnt/var/log/hadoop/steps/2 and so on. The log files are:

    • controller—Log file of the process that attempts to execute your step

    • syslog—Log file generated by Hadoop that describes the execution of your Hadoop job by the job flow step

    • stderr—A stderr log file generated by Hadoop when it attempts to execute your job flow

    • stdout—The stdout log file generated by Hadoop when it attempts to execute your job flow

    These log files do not appear until the step runs for some time, finishes or fails. These logs contain counter and status information.

    [Note]Note

    If you specified a log URI where Amazon Elastic MapReduce (Amazon EMR) uploads log files onto Amazon S3, you can inspect the log files on Amazon S3. There is, however, a five minute delay between when the log files stop being written and when they are saved in a bucket on Amazon S3. So, it is generally faster to look at the log files on the master node, especially if the step failed quickly.