Schema.org – another take on the semantic web

On 2.6.2011, Google, Microsoft, and Yahoo! announced Schema.org. Schema.org is intended as a standard to mark up data on webpages. This brings these three companies one small step closer to the Semantic Web.

Many people should use the same format to markup data. The more people publish their data in the same format, the more people will be able to read the data using the same tools. Its an easy formula:

  • more data = more applications = more benefit

Google, Microsoft, and Yahoo! announced their shared support for schema.org, and call for adoption of their new standard from all webmasters. If you follow their call, you should change your website and write your data in their format.

To give an example (from their website): Instead of writing “Avatar was directed by James Cameron”, you would write:
<div itemscope itemtype=”http://schema.org/Movie”>
<h1>Avatar</h1>
<span>Director: James Cameron (born August 16, 1954)</span>
<span>Science fiction</span>
<a href=”http://www.youtube.com/watch?v=cRdxXPV9GNQ”>Trailer</a>
</div>

For the end-user, this looks the same as any other website: Avatar, Director: James Cameron (born August 16, 1954) Science fiction Trailer

For a machine, or a search engine, the data would be readable and could be imported into a database, such as wikipedia or freebase. Or your personal movie-booking application could show you a trailer next to the ticket. Great! Benefits!

The message is: webmasters out there, adopt! For you who follow the blogpost by Google,Yahoo and Microsoft, you can start adopting!
But hang on, haven’t some of us just installed Drupal 7 and used its RDFa generator to do the same? Didn’t we markup your website with facebook’s opengraph protocol a year ago? And what about those Good Relations markup that Bestbuy did use, and that Google used to bring prices to the search results? This was a good investment, and it showed that the general idea works – standards for metadata. We are heading in the right direction. Now the three big companies worked to continue further:

The FAQ on Schema.org say:
Q: I have already added markup in some other format (i.e. microformats, RDFa, data-vocabulary.org, etc). Do I need to change anything on my site?
If you have already done markup and it is already being used by Google, Microsoft, or Yahoo!, the markup format will continue to be supported. Changing to the new markup format could be helpful over time because you will be switching to a standard that is accepted across all three companies, but you don’t have to do it.

They also say: We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes.

Its a chicken-egg problem:

  • You haven’t been adopting RDFa as much as we liked,
  • So we don’t tell you to adopt RDFa,
  • But we tell you to adopt microdata, which is nearly the same but different

Maybe giving the chicken a bit more time to hatch its old egg and supporting it could have helped. But maybe also the new egg is better and the chicken should really walk over and hatch that one…still, its a new egg and needs to be hatched and it remains a chicken-egg-problem.

In theory, RDFa could have been used as basis for schema.org, but wasn’t. As part of HTML5, the microdata standard was developed. People will need to follow the schema.org bandwagon now. Less risky would have been to reuse RDFa as a markup and publish a schema on schema.org. But it seems the big three companies put their bids on microdata now. This is a signal that can be interpreted in two ways: if we look back we could infer from the RSS/RDF/RDFa/Microformats/Microdata story that it continues and we are facing an era of often-changing standards established by rivaling companies and continous investments needed by webmasters to keep up. If we look into the future, microdata (and its compability with RDF) could be the stable solution for the coming years.