The Million Search Results endpoint gives you access to up to 10,000,000 results that match a query. When your search process has completed, the results are sorted, gzipped and uploaded to the Amazon S3 service as a single file. This service:
Returns results in a downloadable, compressed text file
Returns results asynchronously. You must make separate GetStatus requests to check if the search is complete and to get the download Url.
Only returns document Url, title, character size, and size in bytes. Context snippets are not returned.
Uniques the results on document Url.
Returns results ordered by relevance.
To test the service without doing any programming, you can use this test page. To use the service via the web services API follow these steps:
Submit a Query using the StartSearch action. A request id is returned.
Poll once a minute (passing the request ID into the GetStatus action) to check the status of your search process.
When your search process has completed, GetStatus will return a 'Completed' status and the DownloadUrl of the text file containing your query results.
Download your results.
The results file is a gzipped, tab-delimited text file with UTF-8 encoding. Lines starting with a hash mark (#) are comments.
# Results for Query: text:(cat|dog|mouse) lang:en http://aanews.com:80/?eid=396537 Dog Eats Cat and Mouse us-ascii 7622 42_0_20070513035325_crawl23.arc.gz 5320 http://bcdsports.com:80/ Cat and Mouse Games us-ascii 45464 42_0_20070507173349_crawl23.arc.gz 4523
The following table describes the sequence of attributes in the delimited file.
| Column | Document Attribute |
|---|---|
1 | Url |
2 | Title |
3 | Character set |
4 | Size in bytes |
5 | Internal Document identifier for use when post-processing results using the StartGrep action or from Amazon EC2 |
6 | Offset for use when post-processing results using the StartGrep action or from Amazon EC2 |
7-11 | Captured text, if any, returned from StartGrep Action |