Search Fields

When a document is initially added to the Alexa search engine, about fifty different document attributes are indexed in search fields. When a query is submitted, the search engine retrieves documents pertinent to that query by matching the query to the search fields. By default the anchor text pointing to that document, the document title, its URL, the DMOZ category that it is in, and the text of the document are examined. In general, the more search fields that match, the better score the document will have and the earlier it will appear in the results list. All other things being equal, the more popular a page is the earlier it will appear on the list.

You may also narrow a query by explicitly specifying individual search fields in the Query parameter. The search fields listed in the table below allow you to limit your search based on attributes of the document such as the URL of the document, document size, language, category, and many more. To search an individual field, prefix the search terms by the field name and a colon as shown in the examples.

The table below contains the complete list of search fields. The most commonly used search fields are text, site, lang, type, magic, title, porn, and pagetype. The field names are case-insensitive.

TypeField NameField TypeDescription
DocumentAnchorphrase

Inbound Anchor text. That is, the anchor text that is pointing to this document.

 Charsetboolean

Character set

(big5 big5-hkscs, cp874, cp949, euc-jp, euc-kr, euc-tw, gb18030, gb2312, gbk, iso-2022, iso-2022-cn, iso-2022-cn-ext, iso-2022-jp, iso-2022-jp-2, iso-2022-kr, iso-8459-1, iso-8859-1, ..., iso-8859-14, koi8, koi8-r, koi8-u, s-ascii, u25ufreei, unknown, us-ascii, utf-16be, utf-16le, utf-32be, utf-32le, utf-8, viscii, vuiso-8859-1, wdexows-31j, windows-1250, ..., windows-1257, windows-1)

 ClassTagbooleanValue of class= attributes
 CodebooleanHTTP response code returned by server at crawl time
 Dateboolean

Date document was crawled (2007, 200710, 20071025)

 HasTextboolean

Does the document have text? (yes, no, none) "no" if document is not one of the text types, "none" if of a text type but with no text, "yes" otherwise.

 LangbooleanTwo character language code (af, am, an, ar, arc, az, be, bg, bn, bo, br, bs, bug, ca, chr, co, cop, cs, csb, cv, cy, da, de, dv, el, en, eo, es, et, eu, fa, fi, fo, fr, fy, ga, gd, gsw, gu, he, hi, hr, ht, hu, hy, id, ii, io, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lb, li, lo, lt, lv, mk, ml, mn, mr, my, nap, nds, ne, nl, nn, no, oc, or, os, pa, pl, ps, pt, rm, ro, ru, sc, scn, si, sk, sl, sq, sr, sv, sw, ta, te, th, tk, tl, tr, uk, unknown, ur, uz, vi, wa, x, yi, zh)
 LinkTextphraseOutbound anchor text
 Magicboolean

MIME type determined by analyzing the document content

Note: As of November 2007, about 93% of the pages in the search index were html.

(aiff, application, audio, bitmap, bmp, compress, css, dvi, elc, flash, frame, gif, greymap, gzip, html, image, javascript, jpeg, message, midi, mpeg, msword, news, octet, pdf, pixmap, plain, png, portable, postscript, prs, quicktime, rfc822, rtf, sc, shockwave, sid, stream, tar, text, tiff, unknown, video, x, xbm, xhtml, xml )

 PageTypeboolean

One of (robots. redirect, homepage, irrelevant)

Homepage means that the page is hosted on a personal site. Irrelevant means that the page had a non-200 response code, had no text, or was a redirect or robots.txt page.

Note: By default, pagetype:(-irrelevant) is passed in along with the query in a Search action in order to exclude error pages (404s, 500s), pages without text, and other pages that are irrelevant to a general web text search. Passing in a pagetype:() in your Query parameter will override the default behavior.

When using the "Million Search Results" StartSearch action, you should pass in pagetype:(-irrelvant) if you are only looking for text pages.

 Pornboolean

Document contains adult content (yes, no, maybe)

For strict adult content filtering use porn:no. For moderate adult content filtering use porn:(-yes).

 RegionbooleanTwo character sub-language (bn, cn, hk, tw, unknown)
 RelTagbooleanValue of rel= attributes. See http://microformats.org/wiki/reltag
 SizeAtLeastbooleanMinimum document size in bytes (0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1k, 2k, 4k, 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m, 2m, 4m, 8m, 16m)
 TextphraseText of document, excluding markup
 TitlephraseDocument title
 TypebooleanMIME type from header (text/plain, image/jpeg, jpeg, . . . )
WebsiteDmozphrase

Open Directory Project categories (arts, business, computers, games, health, home, "kids and teens", news, recreation, reference, regional, science, shopping, society, sports, world, ... and about 150,000 more terms)

Search within a specific category: dmoz:("science/chemistry/catalysis/associations")

Search within several categories: dmoz:("science/biology"|"chemistry/catalysis")

 Trafficboolean

Alexa traffic rank (top5, top10, top50, top100, top500, top1000, top5000, top10000, top50000, top100000, top500000, top1000000, top5000000, top10000000)

To get sites ranked from 1001 to 5000 you would use: traffic:(-top1000 top5000)

URLSiteboolean

Site (my.careers.yahoo.com, careers.yahoo.com, yahoo.com, com)

Search only on specific sites: site:(yahoo.com|msn.com|digg.com)

Search only .org sites: site:(org)

Don't search .co.uk sites and .org sites: site:(-co.uk -org)

 Cacheboolean

Normalized URL ("aol.com/test.cgi?foo=bar" from www.aol.com/test.cgi?foo=bar)

You can use this field to see if a specific document is in the search index.

 UrlphraseURL ("aol.com/login?loc=us" from my.name.aol.com/login?loc=us)
 SubSitebooleanSub-site ("name" from my.name.aol.com)
 SitePrefixphraseSite prefix ("my" from my.name.aol.com)
 SLDbooleanSecond level domain ("amazon" from www.amazon.co.uk/path?)
 SuffixbooleanURL suffix ("doc" from aol.com/test.cgi?foo=bar.doc)
 CSuffixbooleanPre-query suffix ("cgi" from aol.com/test.cgi?foo=bar.doc)
 HostphraseHost ("my.name.aol.com" from my.name.aol.com/login?loc=us)
Redirecting toRedirectbooleanNormalized URL ("aol.com/test.cgi?foo=bar" from www.aol.com/test.cgi?foo=bar)
 RSitebooleanSite (members.aol.com, yahoo.com, . . . )
 RUrlphraseURL ("aol.com/login?loc=us" from my.name.aol.com/login?loc=us)
 RSubSitebooleanSub-site ("name" from my.name.aol.com)
 RSitePrefixphraseSite prefix ("my" from my.name.aol.com)
 RSLDbooleanSecond level domain ("amazon" from www.amazon.co.uk/path?)
 RSuffixbooleanURL suffix ("doc" from aol.com/test.cgi?foo=bar.doc)
 RCSuffixbooleanPre-query suffix ("cgi" from aol.com/test.cgi?foo=bar.doc)
 RHostphraseHost ("my.name.aol.com" from my.name.aol.com/login?loc=us)
Linking toLinkbooleanNormalized URL ("aol.com/test.cgi?foo=bar" from www.aol.com/test.cgi?foo=bar)
 LSitebooleanLinking to Site (members.aol.com, yahoo.com, . . . )
 LUrlphraseURL ("aol.com/login?loc=us" from my.name.aol.com/login?loc=us)
 LSubSitebooleanSub-site ("name" from my.name.aol.com)
 LSitePrefixphraseSite prefix ("my" from my.name.aol.com)
 LSLDbooleanSecond level domain ("amazon" from www.amazon.co.uk/path?)
 LSuffixbooleanURL suffix ("doc" from aol.com/test.cgi?foo=bar.doc)
 LCSuffixbooleanPre-query suffix ("cgi" from aol.com/test.cgi?foo=bar.doc)
 LHostphraseHost ("my.name.aol.com" from my.name.aol.com/login?loc=us)

The fields in the table below refer to information about the web server that served the page

TypeField NameField TypeDescription
CrawlIP1booleanFirst octet of server IP address (207)
 IP2booleanFirst two octets of server IP address (207.171)
 IP3booleanFirst three octets of server IP address (207.171.166)
 IP4booleanserver IP address (207.171.166.102)
GeographyCountryboolean2 character country code from server IP address (us, de, ...)
 Stateboolean2 character state code from server IP address (IL, NY, CA,...)
 CitybooleanCity from server IP address
 ZipCodebooleanZip code from server IP address
 DmaCodebooleanDesignated Marked Area from server IP address (U.S. only)
 AreaCodebooleanArea Code from server IP address - U.S. Only