Fly Me to the Moon

It’s interesting to see how people talk about Linked Data & RDF these days. Most of the time the discussions talk about one specific feature of the technology stack which either rocks or sucks, depending on which side the author stands.

Let’s start with what are for me the best two pages about RDF I’ve read since I started working with the technology five years ago: Irene Polikoff in my opinion summarizes perfectly what RDF is about:

The ability to combine multiple data sources into a whole that is greater than the sum of its parts can let the business glean new insights. So how can IT combine multiple data sources with different structures while retaining the flexibility to add new ones into the mix as you go along? How do you query the combined data? How do you look for patterns across subsets of data that came from different sources?

The article gives a very good idea of when you need what parts of the RDF stack to tackle these kind of questions. The reason why I started reading into RDF & Linked Data is because I think RDF can solve these kind of questions in a time and money efficient way up to the scale of global companies and governments. And this is the scale I’m really interested in.

And this brings us to the other end of what we need to become mainstream with a technology: The average (web) developer. It’s still painfully hard to use the highly flexible data model you get with RDF to create user interfaces. I know this because me and some colleagues work on this for some time now and it’s also the domain where we see a lot of (often negative) postings about Linked Data and RDF. Some examples:

What they have in common is that they only look at the Semantic Web stack from their particular, limited perspective. The things they criticize are mostly correct, in its own small world. What they fail to see is that the Semantic Web does not try to solve a problem that is easy but one that is pretty hard: Find a way to split up the web of documents in a web of data and make sure that machines can help us interpreting it and make our life easier. I wasn’t aware of the real complexity of this before I started working with the RDF stack.

Now there are several options to handle this:

  • Ignore everything else than what you are trying to solve: JSON-LD is great, it probably does make things easier for a lot of developers. Manu states that he never had the need for a quadstore and SPARQL in +7 years of working with the technology stack. Good for him but then we obviously don’t solve the same kind of problems. This is not a problem at all but it’s important to keep in mind when we compare technologies.
  • Reinvent the wheel: Jens Ohlig first rants about Semantic Web and then explains for 30 minutes why Wikidata is so much work: unique identifiers, relationship between data, ontologies, provenance, multiple languages etc. I understand that Wikidata decided against using RDF and go for what they know best, which is probably PHP & MySQL. But it doesn’t help your point if you show me that in the end you solve exactly the same kind of problems RDF defined in W3C standards. You just build yet another data silo.
  • Not invented here. The Nepomuk project was funded by an EU FP7 research grant and I guess that none of the guys which originally worked on the RDF code are still there. The new guys probably mainly know key/value stores and didn’t understand RDF or graphs. The normal reaction in this case is to throw things away and start from scratch, instead of learning something which looks unfamiliar at first.
  • Accept that the world is complicated and continue working on the missing parts of the stack.

Manu Sporny:

TL;DR: The desire for better Web APIs is what motivated the creation of JSON-LD, not the Semantic Web. If you want to make the Semantic Web a reality, stop making the case for it and spend your time doing something more useful, like actually making machines smarter or helping people publish data in a way that’s useful to them.

I fully agree Manu but again, there are more problems out there than the ones JSON-LD tries to address. I think Brian Sletten summarized this best in a recent posting at semanticweb.com:

Fundamentally, however, I think the problem comes down to the fact that the Semantic Web technology stack gets a lot of criticism for not hiding the fact that Reality is Hard. The kind of Big Enterprise software sales that get attention promise to hide the details, protect you from complexity, to sweep everything under the rug.

[lots of more good stuff]

What is the alternative? If we abandon these ideas, what do we turn to? The answer after even the briefest consideration is that there is nothing else on the table. No other technology purports to attempt to solve the wide variety of problems that RDF, RDFS, OWL, SPARQL, RDFa, JSON-LD, etc. do.

I couldn’t agree more. You can be big enough that you do all this work on your own. If you are Google or Facebook that might even make sense. For everyone else, go with the standards. Even Google recommends you this.

I’m glad that Manu Sporny accepted to keep JSON-LD RDF compatible, as they solved a lot of interesting problems around JSON-LD like graph normalization and data signing. Maybe we need more people like him which “stop making the case for it and spend [their] time doing something useful”. But at the same time we need the guys who want to bring us to the moon. I’m glad Tim Berners-Lee decided to do so more than 20 years ago when he wrote his ‘Vague, but exciting’ proposal.