HTML 5 - An SEO Perspective
The HTML 5 W3C Working Draft (http://www.w3.org/TR/html5/) was released on 2008 January 22, Tuesday.
Abstract
"This specification defines the 5th major revision of the core language of the World Wide Web, HTML. In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability."
From an SEO perspective, you''''ll want to familiarize yourself with the upcoming buzzwords in HTML 5. There are seven (07) distinct areas of content that you''''ll be working with and those are listed below.
From the W3C Working Draft for HTML 5...
3.3.3. Kinds of Content
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following categories are used in this specification:
3.3.3.1. Metadata Content
Metadata Content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.
3.3.3.2. Prose Content | What is Prose Content?
Most elements that are used in the body of documents and applications are categorized as Prose Content.
3.3.3.3. Sectioning Content
Sectioning Content is content that defines the scope of headers, footers, and contact information.
3.3.3.4. Heading Content
Heading Content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).
3.3.3.5. Phrasing Content
Phrasing Content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.
3.3.3.6. Embedded Content
Embedded Content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.
3.3.3.7. Interactive Content
Interactive Content is content that is specifically intended for user interaction.
What is Prose Content?
The majority of what you will do as an SEO/Content Developer, will be classified under the
Prose Content category. From the HTML 5 Draft...
"Most elements that are used in the body of documents and applications are categorized as
Prose Content."
In this instance, Prose meaning "ordinary speech or writing without metrical structure." Prose writing has a greater irregularity and variety of rhythm, is closer to the patterns of everyday speech and, does not treat a line as a formal unit. Basically it comes down to "semantics" and the "structure" of your HTML Elements.
From the W3C Working Draft...
3.3.1. Semantics
"Elements, attributes, and attribute values in HTML are defined (by this specification) to have certain meanings (semantics). For example, the ol element represents an ordered list, and the lang attribute represents the language of the content.
Authors must only use elements, attributes, and attribute values for their appropriate semantic purposes."
What''''s an SEO To Do?
What exactly does this mean to you as an SEO, Website Designer, HTML Developer, Content Writer, etc? It means that you need to
become intimate with the elements and attributes you have available to you from a semantics perspective. No longer is "tag soup" going to be acceptable. If your document does not convey meaning correctly (the document semantics), what does that leave you with?
Not only do you need to utilize the proper elements and attributes from the proposed recommendations and/or specifications, there is also a specific ordering of those elements based on the semantic structure of the document.