MarkUp Validation Service

Table of contents

Quick Start

Just type (or Cut&Paste) the URL for the page you want to validate into the text field on the form and press the "Validate this page" button.

If you have a local file you want to validate, choose the "File Upload" link from the navigation menu. Select the button labeled "Browse..." (or something like that, depending on your browser) and choose the file you want to upload in the usual manner for your OS.

Introduction

The W3C MarkUp Validation Service is a web gateway to a well known SGML parser called SP. SP will take your HTML and compare it to a set of objective syntax rules called a "DTD", a Document Type Definition. This way you can be sure your HTML is really valid and not just that it conforms to some random programmer's idea of "nice" HTML. Note that valid HTML does not guarantee that your pages will work OK in all browsers. Most of them are severely broken and you may need to find alternate ways of achieving your goal.

When you send an URL to the W3C MarkUp Validation Service, it will fetch that URL and feed it to the SGML parser. If you upload a file it'll get fed directly into the SGML parser. We then take the output from the SGML parser and format it nicely as HTML and send it back to your web browser. The W3C MarkUp Validation Service isn't generating any of the error messages; they are all generated by the underlying SGML Parser which is checking your HTML against the actual standard for the version of HTML you are using.

The Options

In addition to the text field where you enter an URL -- or the file selection field if you are uploading files -- there are a few checkboxes that alter the behaviour of the validator. The options are:

Show source input (ss)
Displays the HTML source of the document you validated and links error messages directly to lines in this output. Makes it easy to see what's wrong.
Show an outline of this document (outline)
Will generate an outline of your document from the H1 - H6 elements. For a properly formed document, this will be a nicely nested tree structure. The visualization of your document's structure makes it easier to see where you've skipped a heading.
Show parse tree (sp)
Shows you exactly how the SGML Parser read your document. Probably best used only by advanced users as it deals with low-level SGML constructs.
exclude attributes from the parse tree (noatt)
Suppress attributes from the parse tree to make it more readable.

Calling/Linking to the Validator

You can link directly to the Validator home page, or you can call the Validator CGI program. The home page is <http://validator.w3.org/> at the moment (and for the foreseeable future) and the CGI program can be reached at <http://validator.w3.org/check>.

If you call the CGI program with extra path info matching "/referer" (i.e. <http://validator.w3.org/check/referer>) it will fetch the refering document and validate that. This means that if you embed a link to that URL in your pages, following on that link will send you the validation results for that page.

You can also link to the validation results for a specific page. You do this by giving "check" an "uri" parameter pointing at the page you want to validate. For example <http://validator.w3.org/check?uri=http://www.example.com/> will validate the www.example.com home page.

The various options are listed above in the section "The Options" in parenthesis after the long name. To add options to your links directly, append the options separated by a semi-colon. For example <http://validator.w3.org/check?uri=http://www.example.com/;ss=1;outline=1;sp=1> will validate the example.com home page with "Show Source", "Outline" and "Show Parse Tree" on, but "Exclude Attributes" off.

You may also see these separated by ampersands, but this usage is deprecated and support may be removed at some time in the future.

Interpreting the results

In spite of our efforts, interpreting the MarkUp Validator's error messages isn't quite what you'd call easy. The error messages are generated in the context of a full SGML environment which demands a somewhat higher level of technical detail then your average HTML document. We have set up a page listing errors and their explanation, which should help you find out what meaning lies behind the cryptic messages, and fix your markup.

We're working on ways to make the error messages more friendly, but for now, if the errors explanation page doesn't work for you, feel free to email the (publicly archived) www-validator@w3.org mailing list if you need help interpreting the results. This will have the added benefit of letting us know which error messages are causing the most trouble so we can fix those first. Please be as specific as possible and include the exact error message and, preferably, an URL we can validate to see for ourselves.

Output Options

In addition to the HTML output intended for human consumption in a browser, the Validator has some experimental features to generate machine parseable output in a few different forms. To enable these output options, append ";output=<option>" to the URI of the Validation results (an interface for these options will be provided when they exit the beta stage).

These options are experimental! The API and output format is subject to change without notice and may well be removed or disabled at any time. They are provided now to garner public feedback to determine how best to support this functionality in the future. One particularly likely option being considered is removing these features alltogether in favor of a full-blown SOAP interface. You have been warned!

EARL/RDF (earl)
Produces output in the EARL RDF syntax.
Notation3 (n3)
Produces output in the Notation3 RDF syntax
XML (xml)

Produces output in a homegrown XML format (yes, we know...).

The DTD for this format is as follows:

	      
<!DOCTYPE result [
  <!ELEMENT result (meta, warnings?, messages?)>
  <!ATTLIST result
    version CDATA #FIXED '0.9'
  >

  <!ELEMENT meta (uri, modified, server, size, encoding, doctype)>
  <!ELEMENT uri      (#PCDATA)>
  <!ELEMENT modified (#PCDATA)>
  <!ELEMENT server   (#PCDATA)>
  <!ELEMENT size     (#PCDATA)>
  <!ELEMENT encoding (#PCDATA)>
  <!ELEMENT doctype  (#PCDATA)>

  <!ELEMENT warnings (warning)+>
  <!ELEMENT warning  (#PCDATA)>

  <!ELEMENT messages (msg)*>
  <!ELEMENT msg      (#PCDATA)>
  <!ATTLIST msg
    line   CDATA #IMPLIED
    col    CDATA #IMPLIED
    offset CDATA #IMPLIED
  >
]>
              
            

Each element except the containers (result, meta, warnings, messages) and the free-form text fields (warning, msg) will take a single value of a specific type.

The base document element is result. The only elements allowed to be directly contained at the first level are meta, warnings, and messages. warnings, and messages may be omitted if empty, and no first-level elements may appear more then once.

The meta element

The meta element contains various metadata about about the Validated document. It contains further elements describing each value.

uri
The URI of the document validated.
modified
The Last-Modified header field of the document as free-form text.
server
The Server header field of the document as free-form text.
size
The size in bytes of the document.
encoding
The Character Encoding used for Validation.
doctype
A text string describing the DOCTYPE used for Validation.

Currently, the type of these fields is free-form text, but it is intended that a future revision will switch to less opaque data types so these values can be reliably machine-parsed.

The warnings element

The warnings element can contain only one sub-element; the warning element. Multiple warning elements may appear and each one contains free-form text corresponding to a warning of the type found in the "Warnings" section of the HTML output (e.g. "DOCTYPE override in effect!").

The messages element.

The messages element can contain only one sub-element; the msg element. Multiple msg elements may appear and each contains free-form text representing one detected error. The msg element has three attributes; line, col, offset. These contain a number representing the line and column on which the error was detected, and the offset in characters from the beginning of the document (as opposed to col which can be said to be the offset from the beginning of the line).

Comma Tools / Site Tools

This site uses "comma tools", as does W3C and other sites. This means you can append a string (starting with a comma, hence the name) to the URI (address) of any page on the site and trigger a few administrative or technical tools for this page.

These tools are still under test, and reportedly do not work yet when appended to a validation result page.

What it does Tool used , shortcut
A plain text version of the page. HTML2Text ,text
Validate the MarkUp. W3C Markup Validator ,validate
Check links (anchors). W3C Link Checker ,checklink or ,checklinks
Check links (recursively) W3C Link Checker ,rchecklink or ,rchecklinks
A version of the page with linearized tables. Tablin ,tablin
CVS history for the page or resource. CVSWeb ,cvs or ,cvslog

Installing a local Validator

You can download the Validator to run on your own system, but it's not recommended for average users as the process is rather complex and involves obscure incantations on the command line. :-)

If you feel you're up to the task, you can find the information you need in our Developer Manual.

Valid XHTML 1.0! The W3C Validator Team
$Date: 2002/11/30 16:45:33 $