Formatted Content: XHTML

When you create a HIT or a Qualification test, you can include various kinds of content to be displayed to the Worker on the Amazon Mechanical Turk web site, such as text (titles, paragraphs, lists), media (pictures, audio, video) and browser applets (Java or Flash).

You can also include blocks of formatted content. Formatted content lets you include XHTML tags directly in your instructions and your questions for detailed control over the appearance and layout of your data.

You include a block of formatted content by specifying a FormattedContent element in the appropriate place in your QuestionForm data structure. You can specify any number of FormattedContent elements in content, and you can mix them with other kinds of content.

The following example uses other content types (Title, Text) along with FormattedContent to include a table in a HIT:

<Text>
  This HIT asks you some questions about a game of Tic-Tac-Toe
  currently in progress.  Your answers will help decide the next move.
  Squares with "-" are available.
</Text>
<Title>The Current Board</Title>
<Text>
  The following table shows the board as it currently stands.
</Text>
<FormattedContent><![CDATA[
<table border="1">
  <tr>
    <td></td>
    <td align="center">1</td>
    <td align="center">2</td>
    <td align="center">3</td>
  </tr>
  <tr>
    <td align="right">A</td>
    <td align="center"><b>X</b></td>
    <td align="center">-</td>
    <td align="center"><b>O</b></td>
  </tr>
  <tr>
    <td align="right">B</td>
    <td align="center">-</td>
    <td align="center"><b>O</b></td>
    <td align="center">-</td>
  </tr>
  <tr>
    <td align="right">C</td>
    <td align="center">-</td>
    <td align="center">-</td>
    <td align="center"><b>X</b></td>
  </tr>
  <tr>
    <td align="center" colspan="4">It is <b>X</b>'s turn.</td>
  </tr>
</table>
]]></FormattedContent>

For more information about describing the contents of a HIT or Qualification test, see the QuestionForm data structure.

Using Formatted Content

As you can see in the example above, formatted content is specified in an XML CDATA block, inside a FormattedContent element. The CDATA block contains the text and XHTML markup to display in the Worker's browser.

Only a subset of the XHTML standard is supported. For a complete list of supported XHTML elements and attributes, see the table below. In particular, JavaScript, element IDs, class and style attributes, and <div> and <span> elements are not allowed.

XML comments (<!-- ... -->) are not allowed in formatted content blocks.

Every XHTML tag in the CDATA block must be closed before the end of the block. For example, if you start an XHTML paragraph with a <p> tag, you must end it with a </p> tag within the same FormattedContent block.

[Note]Note

The tag closure requirement means you cannot open an XHTML tag in one FormattedContent block and close it in another. There is no way to "wrap" other kinds of question form content in XHTML. FormattedContent blocks must be self-contained.

XHTML tags must be nested properly. When tags are used inside other tags, the inner-most tags must be closed before outer tags are closed. For example, to specify that some text should appear in bold italics, you would use the <b> and <i> tags as follows:

<b><i>This text appears bold italic.</i></b>

But the following would not be valid, because the closing </b> tag appears before the closing </i> tag:

<b><i>These tags don't nest properly!</b></i>

Finally, formatted content must meet other requirements to validate against the XHTML schema. For instance, tag names and attribute names must be all lowercase letters, and attribute values must be surrounded by quotes.

For details on how Amazon Mechanical Turk validates XHTML formatted content blocks, see "How XHTML Formatted Content Is Validated," below.

Supported XHTML Tags

FormattedContent supports a limited subset of the XHTML 1.0 ("transitional") standard. The complete list of supported tags and attributes appears in the table below. Notable differences with the standard include:

  • JavaScript is not allowed

    The <script> tag is not supported, and anchors (<a>) and images (<img>) cannot use javascript: targets in URLs.

  • CSS is not allowed

    The <style> tag is not supported, and the class and style attributes are not supported. The id attribute is also not supported.

  • XML comments (<!-- ... -->) are not supported

  • URLs in anchor targets, image locations, and iframe locations are limited to the following: http:// https:// ftp:// news:// nntp:// mailto:// gopher:// telnet://

Other things to note with regards to supported tags and attributes:

  • In addition to the attributes listed, the title attribute is supported for all tags, and the dir and lang attributes are supported for all tags except <br>

  • The alt attribute is required for <area> and <img> tags

  • <iframe> tags cannot be empty

    They must contain simple text and cannot contain tags.

    • The following example is correct:

      <iframe src="http://www.slashdot.org">Your browser does not support IFRAMEs. Please return this HIT.</iframe>
      	    
    • The following examples are not correct:

      <iframe src="http://www.slashdot.org"/>
      <iframe src="http://www.slashdot.org"></iframe>
      <iframe src="http://www.slashdot.org"> </iframe>
      <iframe src="http://www.slashdot.org">This frame links <a href="http://www.slashdot.org/">here</a></iframe>
      	    
  • <img> tags also require a src attribute

  • <map> tags require a name attribute

The following table lists the supported tags and attributes:

TagAttributes
aaccesskey charset coords href hreflang name rel rev shape tabindex target type
areaalt coords href nohref shape target
b 
big 
blockquotecite
br 
center 
cite 
code 
colalign char charoff span valign width
colgroupalign char charoff span valign width
dd 
delcite datetime
dl 
em 
fontcolor face size
h1align
h2align
h3align
h4align
h5align
h6align
hralign noshade size width
i 
iframealign frameborder height longdesc marginheight marginwidth name scrolling src width
imgalign alt border height hspace ismap longdesc src usemap vspace width
inscite datetime
litype value
mapname
olcompact start type
palign
prewidth
qcite
small 
strong 
sub 
sup 
tablealign bgcolor border cellpadding cellspacing frame rules summary width
tbodyalign char charoff valign
tdabbr align axis bgcolor char charoff colspan headers height nowrap rowspan scope valign width
tfootalign char charoff valign
thabbr align axis bgcolor char charoff colspan headers height nowrap rowspan scope valign width
theadalign char charoff valign
tralign bgcolor char charoff valign
u 
ulcompact type

How XHTML Formatted Content Is Validated

When you create a HIT or a Qualification test whose content uses FormattedContent, Amazon Mechanical Turk attempts to validate the formatted content blocks against a schema. If the formatted content does not validate against the schema, the operation call will fail and return an error.

To validate the formatted content, Mechanical Turk takes the contents of the FormattedContent element (the text and markup inside the CDATA), then constructs an XML document with an appropriate XML header, <FormattedContent> as the root element, and the text and markup as the element's contents (without the CDATA). This document is then validated against a schema.

For example, consider the following FormattedContent block:

  ...
  <FormattedContent><![CDATA[
    I absolutely <i>love</i> chocolate ice cream!
  ]]></FormattedContent>
  ...

To validate this block, Mechanical Turk produces the following XML document:

<?xml version="1.0"?>
<FormattedContent xmlns="http://www.w3.org/1999/xhtml">
  I absolutely <i>love</i> chocolate ice cream!
</FormattedContent>

The schema used for validation is called FormattedContentXHTMLSubset.xsd. For information on how to download this schema, see WSDL and Schema Locations.

You do not need to specify the namespace of the XHTML tags in your formatted content. This is assumed automatically during validation.