Amazon Elastic MapReduce
Developer Guide (API Version 2009-11-30)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

How to Use the Hadoop User Interface

The Hadoop software publishes job flow status to an internally running web server on the master node of the Hadoop cluster. You can view the job flow status by accessing this web server directly using a tool such as FoxyProxy. The web UIs are, by default, located as follows:

  • http://[master_dns_name]:9100/ - web UI for MapReduce job tracker(s)

  • http://[master_dns_name]:9101/ - web UI for HDFS name node(s)

To relocate these UIs, edit conf/hadoop-default.xml.

To view the Hadoop Distributed File System UI

  • Go to http://MasterDNSName:9101/.

To view the job flow status using the Hadoop UI

  1. To access the Hadoop Job Tracker UI running on the master node, go to http://MasterDNSName:9100/.

    You can use the Amazon EMR console to get the value for master_dns_name. The Hadoop UI opens.

    The Cluster Summary shows that there were two slave nodes in the cluster and that each performed four tasks. The Completed Jobs section shows that the map and reduce job flows are 100% complete.

  2. Click a job flow ID.

    Hadoop displays information about the selected job flow.

    This display shows a variety of file system and job flow counters. It also shows that zero tasks failed but two tasks were killed.

  3. Choose one of the following actions:

    To...Do this...
    Find out more about the killed tasks

    Click on an entry in the Failed/Killed Task Attempts column.

    Get more information about the mapper tasks

    Click map.

    Hadoop displays all of the tasks completed and their status.

    All of the mapper tasks completed successfully.

    Display task counters

    Click on an entry in the Counters column.

    Hadoop displays the task counter information.

    Get information about tasks

    Click a task.

    Hadoop displays task information.

  4. On the All Task Attempts pane, choose one of the following actions:

    To...Do This...
    Get information about the node that ran the task

    Click an entry in the Machine column.

    Hadoop displays host information.

    See the task logs

    Click an entry in the Task Logs column.

    Hadoop displays the logs.

How to Install Foxy Proxy

Hadoop provides a user interface that makes job flow related information available. This user interface runs automatically on a web server located on your application's master node. To access the Hadoop user interface, set up a SOCKS server on your local computer and create an SSH tunnel logging in as the Hadoop user between your computer and the master node on the Hadoop cluster that is processing your job flow. This tunnel is also known as port forwarding. After a SOCKS server is started, use FoxyProxy to access the Hadoop user interface.

To set up a SSH SOCKS server

  • Start an SSH SOCKS server somewhere within your firewall and outside of Hadoop using the following command:

    ssh –i keyfile -ND port_number hadoop@MasterDNSName

    The following is an example for <myKeyPairName> at ec2-67-202-49-73.

    ssh -i ~/ec2-keys/myKeyPairName -ND 8157 hadoop@ec2-67-202-49-73.compute-1.amazonaws.com

FoxyProxy is a set of proxy management tools for Firefox. This Firefox extension switches an Internet connection across one or more proxy servers based on URL patterns. The following procedure explains how to install FoxyProxy so that you can access the Hadoop UI.

[Note]Note

This guide shows screen shots of FoxyProxy version 2.8.11.

To install FoxyProxy

  1. Download and install the standard version of FoxyProxy from http://foxyproxy.mozdev.org/downloads.html.

  2. Restart Firefox after installing FoxyProxy.

To configure FoxyProxy to connect to a SOCKS server

  1. On the Firefox Tools menu, click FoxyProxy, and then select Options.

    FoxyProxy displays the FoxyProxy Options window.

  2. On the Proxies tab click Add New Proxy.

    The FoxyProxy - Proxy Settings dialog box opens.

  3. On the General tab enter a proxy name and verify that Perform remote DNS lookups on hostnames loading through this proxy is selected.

  4. On the Proxy Details tab do the following:

    1. Select Manual Proxy Configuration option and enter the host name and port number of the host you ran the ssh command as the Hadoop user in step 1.

      In this case, we are running the proxy on our desktop so we enter localhost and port 8157.

    2. Select the SOCKS proxy? check box.

    3. Select SOCKS v5.

      In the next 2 steps add URL patterns for *ec2*.amazonaws.com* and *ec2.internal*.

  5. On the URL Patterns tab select Add New Pattern.

    The FoxyProxy - Add/Edit Pattern dialog box opens.

  6. In the FoxyProxy - Add/Edit Pattern pane do the following:

    1. Select the Enabled check box.

    2. Enter a name in the Pattern Name box.

    3. Enter the following URL pattern in the URL pattern box: *ec2*.amazonaws.com*

    4. Select the Wildcards option.

    5. Select OK to close the FoxyProxy - Add/Edit Pattern pane.

  7. On the URL Patterns tab select Add New Pattern.

  8. In the FoxyProxy - Add/Edit Pattern pane do the following:

    1. Select the Enabled check box.,

    2. Enter a name in the Pattern Name box.

    3. Enter the following URL pattern in the URL pattern box: *ec2.internal*

      When you browse to a URL that matches this pattern, the associated proxy is used to load that URL. This proxy establishes the secure port forwarding connection between your host and the master node.

    4. Select the Wildcards option.

    5. Select OK to close the FoxyProxy - Add/Edit Pattern pane.

    6. Select OK to close the FoxyProxy - Proxy Settings pane.

  9. On the FoxyProxy Options pane, select the down arrow on the Mode drop-down menu, select Use proxies based on their predefined patterns and priorities, and then select Close.

    You are prompted to restart Firefox to activate your changes.

When you enter a URL that matches the pattern *ec2*.amazonaws.com* into the Firefox browser, Firefox uses the proxy to connect to the master node of the cluster.

The web user interfaces for Hadoop are located at the URI in the following table.

Name of Interface

URI

MapReduce job tracker http://master_dns_name:9100/
HDFS name node http://master_dns_name:9101/
MapReduce task tracker http://master_dns_name:9103/