schema.org: Not Too Impressive

Last Friday Google, Yahoo and Bing announced the launch of schema.org, which promotes annotation of web pages to make them more useful for search engines. This is definitely a hot topic as the current web of documents reached its limits a few years ago. Every search engine user knows what I am talking about.

I am active in the semantic web world for a while now and many people in this community were not very pleased about the decisions taken by the big three. But what is my problem with schema.org? There are quite some and most of them got well addressed in other blog posts, let me recap:

  • RDFa is from a complexity point of view the same thing as Microdata, Manu Sporny proves that in his blog post by example. The argument that RDFa is more complex than Microdata is pure nonsense.
  • RDFa is a serialization of RDF within XML/(X)HTML trees. In case you do not know RDF, Mike Bergmann calls it the universal data solvent, which gets it pretty well. RDF provides much more than Microdata and is so much more powerful. There is simply no excuse for not using RDFa in the first place.
  • There are lots of great examples out there how you can use RDFa, one of the famous examples is probably GoodRelations. On the schema.org FAQ they state that their work is “inspired by earlier work like Microformats, FOAF, GoodRelations, OpenCyc, etc.”.

The last point needs some more explanation. In RDF, a shared vocabulary gets described in a so called ontology which is most of the times expressed in RDF Schema or OWL. Such an ontology defines the wording used (called predicates) and also the data type of each predicate and relationships to other predicates. Both RDF Schema and OWL are expressed in RDF, which makes it possible to bootstrap not just the data itself but also the shared vocabulary used for describing the data in the same format. This is big, really big!

Another important aspect is that data modeled in RDF can be neither correct, nor complete. If you ask 10 persons to model the world, you will get 10 different results. If you express those 10 models in RDF, we will be able to map matching things between the different models even if it is not always exactly the same thing. RDF can handle this uncertainty, which is for me one of the favorite things about RDF.

This is an important lesson learned after failures in the 1990ies when companies like Taligent tried to model the whole world in one single library. RDF instead propagates the concept of domain experts. If you are strong in a specific domain, you should create the vocabulary for it, not some experts at Google/Yahoo/Bing which try to figure out how they can squeeze the whole universe in 300 or so tags. Maybe your domain vocabulary is not fully compatible with my domain vocabulary but that is just how the world works and RDF can handle that by design.

So beside the technical decision not to use RDFa this is for me the biggest fail with schema.org. They thought of a vocabulary which fits for them. This vocabulary is not described in RDF which makes it far less useful for machines/computers and it is very hard to extend it that way or interlink it with more powerful vocabularies like GoodRelations, FOAF etc. which are already out there for a long time. Tim Berners-Lee suggested a 5-star deployment scheme for Linked Open Data, according to that I would probably give schema.org a 4 right now but there is still lots of room for improvement.

Fortunately some people already addressed the RDF part of it: With the help of some well known people in the Semantic Web world Michael Hausenblas created a “real” schema out of it, expressed in RDF. The results can be found at schema.rdfs.org. That is the way the well payed engineers of the three big companies should have done it in the first place. Now we can link it to DBPedia and other resources, extend it for our specific domains and use it in RDFa or whatever RDF serialization we choose.

I am not sure where it is going with schema.org. RDFa co-creator Manu Sporny is pessimistic about the current state while others like Mike Bergmann are very optimistic and think it is one of the most important steps in the semantic web world so far. I think the RDF Schema of the vocabulary is the first step into the right direction but I am afraid that the decision for Microdata will seriously harm adoption of RDFa as a standard. This should be changed as soon as possible! Let us see what Manu Sporny and others will present in the next few days or weeks. By the way there is also a session at this years SemTech about it.

So what is schema.org currently? A step back in terms of technology used plus a vocabulary which is not according to the intentions of the semantic web world as being done for several years by now. Not too impressive.