Regular Expressions

With the StartGrep action you can use regular expressions to select or extract data from documents that matched your Million Search Results query.

Before using the web service, you might want to test your regular expression using this test page

Select documents that contain an uppercase SCRIPT tag:

SCRIPT

Perform a case-insensitive filtering, selecting documents that contain script, SCRIPT, ScRiPt, etc. HTML tags:

(?i)script

Select documents that contain a latitude or longitude in the form 46 37.73 N

\d{1,3}  \d{1,2}\.\d{1,3} [NSEW]

Select documents that contain a line starting with the word "When"

(?m)^When

You can also use regular expressions to extract text from documents. To capture text you use parentheses in your regular expression pattern to designate the portion of the pattern match that should be captured. Only the first five matches are written to the output file. If you have multiple capture groups, only the first one is written out.

Extract URLs beginning with http:// and surrounded by quotes:

(?i)[\"'\b](http:.+?)[\"'\b]

Extract links from the document:

(?i)href=[\"'](.+?)[\"']

Extract meta tags from the document:

(?i)<meta.+?content=[\"'](.+?)[\"']

Extract png image links:

(?i)<img.+?src=[\"'](.+?)\.png[\"']

Extract links to mp3 files:

(?i)[\"']http://(.+?).mp3[\"']

Extract only lines of text that start with "The" and end with "bananas."

(?m)(^The.+bananas\.$)

Extract latitude or longitude coordinates in the form 46 37.73 N

(\d{1,3}  \d{1,2}\.\d{1,3} [NSEW])