Crawl

Returns meta data about a specific document from the most recent Alexa web crawl. The meta data includes the return code, size of the page, checksum, and the URLs of links, images, frames and more. The crawl meta data is based on Alexa's current snapshot of the web. Updates occur following the completion of Alexa's Web-wide crawl cycle, which takes approximately two months to complete.

Note that this action does not return any traffic data. See UrlInfo action for traffic data.

The Crawl Action takes the following parameters. Required parameters must be provided for the request to succeed.

NameDescriptionRequired
Action

Set the Action parameter to Crawl to get access to document metadata.

Yes
ResponseGroup

The only valid value is MetaData

Yes
Url

Any valid URL. The URL parameter specifies the URL, host or domain about which you would like to receive information.

Yes
Version Pass in the current version number, 2005-07-11, to ensure that requests succeed even if the API changes in future versions.No
Start

1-based index of result at which to start. Note: An empty document will be returned if this value exceeds the total number of available results.

No
Count

Number of results to return for this request, beginning from specified Start number (maximum 20)

No
Purify

Canonicalize URL prior to requesting its data. (true | false). The default is true.

No
ResponseCodes

Return metadata for entries that match one of this comma-separated list of HTTP response codes (200,302)

No

The following example shows a Query-style request and response

<aws:CrawlResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
   <aws:RequestId>608de633-e4a0-422e-ab7a-517209bc0df2</aws:RequestId>
</aws:OperationRequest>
<aws:CrawlResult>
<aws:Alexa>
  
  <aws:CrawlData>
    <aws:MetaData>
      <aws:ResultNumber>1</aws:ResultNumber>
      
      <aws:RequestInfo>
        <aws:OriginalRequest>http://alexa.com:80/</aws:OriginalRequest>
        <aws:IPAddress>64.213.200.100</aws:IPAddress>
        <aws:RequestDate>20070502195602</aws:RequestDate>
        <aws:ContentType>text/html</aws:ContentType>
        <aws:ResponseCode>200</aws:ResponseCode>
        <aws:Length>58319</aws:Length>
        <aws:Language>en.utf-8 0.907 2829</aws:Language>
      </aws:RequestInfo>
      
      <aws:Checksums>
        <aws:AppearanceChecksum>db16a79395ad7a0774faf065aee9a794</aws:AppearanceChecksum>
        <aws:ContentChecksum>60e305f16781a67a585647efc158d193</aws:ContentChecksum>
      </aws:Checksums>
      
      <aws:OtherUrls>
        <aws:OtherUrl source="href">www.alexa.com/favicon.ico</aws:OtherUrl>
        <aws:OtherUrl source="src">purl.org/atom/ns</aws:OtherUrl>
      </aws:OtherUrls>
      
      <aws:Images>
        <aws:Image>client.alexa.com/common/images/alexa.gif</aws:Image>
        <aws:Image>client.alexa.com/common/images/button_search_arrow.gif</aws:Image>
      </aws:Images>
      
      <aws:Links>
        <aws:Link>
          <aws:LocationURI>www.alexa.com/</aws:LocationURI>
        </aws:Link>
        <aws:Link>
          <aws:Name>Traffic Rankings</aws:Name>
          <aws:LocationURI>alexa.com/site/ds/top_500?qterm=</aws:LocationURI>
        </aws:Link>
      </aws:Links>
      
    </aws:MetaData>
  </aws:CrawlData>
  
</aws:Alexa>
</aws:CrawlResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:CrawlResponse>