Amazon Elastic Compute Cloud
User Guide (API Version 2011-12-15)
Print this pageEmail this pageGo to the ForumsView the PDFShare this page on TwitterShare this page on FacebookBookmark this page on DeliciousSubmit this page to RedditSubmit this page to DiggDid this page help you?  Yes  No   Tell us about it...

Using Public Data Sets

This section describes how to use Amazon EC2 public data sets.

Public Data Set Concepts

Amazon EC2 provides a repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. Amazon stores the data sets at no charge to the community and, like with all AWS services, you pay only for the compute and storage you use for your own applications.

Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon EC2 instances and start computing on the data within minutes. You can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, you can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.

For more information, go to the Public Data Sets Page.

Available Public Data Sets

Public data sets are currently available in the following categories:

  • Biology—Includes Human Genome Project, GenBank, and other content.

  • Chemistry—Includes multiple versions of PubChem and other content.

  • Economics—Includes census data, labor statistics, transportation statistics, and other content.

  • Encyclopedic—Includes Wikipedia content from multiple sources and other content.

Finding Public Data Sets

Before you launch a public data set, you must locate the set to launch.

To find a public data set

  1. Go to the Public Data Sets Page.

  2. Locate a public data set and write down its snapshot ID for your operating platform (e.g., Windows, Linux/UNIX).

Launching an Instance

Launch an instance as you normally do. For more information, see Launching and Using Instances.

Launching a Public Data Set Volume

To use a public data set, you launch an Amazon EBS volume, specifying its snapshot ID.

AWS Management Console

To create an Amazon EBS volume

  1. Log in to the AWS Management Console and click the Amazon EC2 tab.

  2. Click Volumes in the Navigation pane.

    The console displays a list of current volumes.

  3. Click Create Volume.

    The Create Volume dialog box appears.

  4. Configure the following settings and click Create.

    • Size of the volume (in GiB) (optional)

    • Availability Zone in which to launch the instance

    • The ID of the public data set snapshot

    Amazon EC2 begins creating the volume.

Command Line Tools

To create an Amazon EBS volume

  1. Enter the following command.

    PROMPT>  ec2-create-volume --snapshot public-data-set-snapshot-id --zone availability-zone

    Amazon EBS returns information about the volume similar to the following example.

    VOLUME vol-4d826724 85 us-east-1a creating 2008-02-14T00:00:00+0000 
  2. To check whether the volume is ready, use the following command.

    PROMPT>  ec2-describe-volumes vol-4d826724

    Amazon EBS returns information about the volume similar to the following example.

    VOLUME  vol-4d826724 85 us-east-1a  available 2008-07-29T08:49:25+0000 

API

To create an Amazon EBS volume

  • Construct the following Query request.

    https://ec2.amazonaws.com/
    ?Action=CreateVolume
    &AvailabilityZone=zone
    &SnapshotId=public-data-set-snapshot-id
    &...auth parameters...

    Following is an example response.

    <CreateVolumeResponse xmlns="http://ec2.amazonaws.com/doc/2011-12-15/">
      <volumeId>vol-4d826724</volumeId>
      <size>85</size>
      <status>creating</status>
      <createTime>2008-05-07T11:51:50.000Z</createTime>
      <availabilityZone>us-east-1a</availabilityZone>
      <snapshotId>snap-59d33330</snapshotId>
    </CreateVolumeResponse>

Mounting the Public Data Set Volume

Mount the volume as you normally do. For more information, see Making an Amazon EBS Volume Available for Use.