When software engineers come into touch with the world of semantic data for the first time, it may be difficult for them to grasp all the benefits and consequences at once. When I heard of RDF for the first time, I thought this concept could possibly impose more risks than benefits. So how would RDF fit ino the architecture of my solutions? Or better: why did I feel at first that this possibly may not be the case ? Just in case you have or have had similar feelings, I would want to share my thoughts on that with you.
I think that back then I felt quite uncertain, because I would have to let go all the known ways of accessing structured data. At first I even considered RDF-based data to be stored with no real structure, but of course that is not true. Later, I felt that modeling data with literally any vocabulary would imply lots of problems and hinder applications from being able to deal with data from anybody else from outside, but of course that is not true as well.
After some time, dealing with RDF and practically working with it, I more and more understood that these points are not a problem at all, instead RDF shows its beauty and flexibility by letting me do things just differently.
First of all, it is not that RDF data would be poorly structured. Instead, it is structured by semantics – only!
In fact vocabulary used within the RDF concept provides the most flexible and at the same time a very precise and reliable way to describe and structure data in a machine-readable way. The vocabulary can be uniquely identified, so that the the meaning of the described data can always be exactly determined. And with RDF you put the knowledge about the structure of data into he data itself, while with other concepts you have to put it mostly into (self-)developed software. This is why RDF is said to compare to other storage concepts like knowledge engineering to software engineering.
Using semantics over conventional ways of structuring data, so e.g. by tables and hierarchies, has another important advantage: scalability in complexity. Without any problem, RDF data can get more complex without breaking any existing logic or requiring more efforts to keep more complex data models performing well. On the other hand this is not true for tables and hierarchies, they don’t scale well in complexity at all.
Second, it is not that one would need a restricted set of vocabulary to keep data usable for oneself or anybody else. Instead, RDF encourages to use any existing vocabulary in order to increase reusability of data. In fact I can model a given set of data with any amount of different sets of vocabulary at the same time, and that without much overhead or redundancy (try to do that with SQL based data sources…).
If you would restrict your solutions to a restricted set of vocabulary, it would not give you any extra benefit. Instead you would restrict your services to those RDF-based data sources that use the very same set of vocabulary that you do. And that would be much more of a restriction in the future than you can think of today.
However, even if you use existing vocabulary as much as possible, you may still need to interlink your data with data from a source describing same things, but using different vocabulary. Then you simply need a way to translate between different sets of vocabularies. Due to the nature of vocabularies being described semantically themselves, this can be done in a generic manner. In the easiest case this translation would take place in the back end, so to say in your triple store engine, making this step completely transparent to your services using the store. By the time of this writing however, no database product seems to be able to do that. As an alternative, such translation could be encapsulated in a framework, through which your services would access RDF data sources, including our own. netlabs.org is currently working on a technique that allows interlinking between sets of vocabularies with a flexible way of defining such a translation, for sure basing on RDF-based data as well.