Million Search Results

The Million Search Results endpoint gives you access to up to 10,000,000 results that match a query. When your search process has completed, the results are sorted, gzipped and uploaded to the Amazon S3 service as a single file. This service:

To test the service without doing any programming, you can use this test page. To use the service via the web services API follow these steps:

  1. Submit a Query using the StartSearch action. A request id is returned.

  2. Poll once a minute (passing the request ID into the GetStatus action) to check the status of your search process.

  3. When your search process has completed, GetStatus will return a 'Completed' status and the DownloadUrl of the text file containing your query results.

  4. Download your results.

The results file is a gzipped, tab-delimited text file with UTF-8 encoding. Lines starting with a hash mark (#) are comments.

   # Results for Query: text:(cat|dog|mouse) lang:en
   http://aanews.com:80/?eid=396537	 Dog Eats Cat and Mouse	us-ascii	7622     42_0_20070513035325_crawl23.arc.gz   5320   
   http://bcdsports.com:80/	         Cat and Mouse Games	us-ascii	45464    42_0_20070507173349_crawl23.arc.gz   4523

The following table describes the sequence of attributes in the delimited file.

ColumnDocument Attribute

1

Url

2

Title

3

Character set

4

Size in bytes

5

Internal Document identifier for use when post-processing results using the StartGrep action or from Amazon EC2

6

Offset for use when post-processing results using the StartGrep action or from Amazon EC2

7-11

Captured text, if any, returned from StartGrep Action