Data quality counts

When linking with other data sources, without a doubt knowing the origin and the quality level of external data is important. After all, an important point in using semantic web technology is to use other data sources to enrich our own data, and enhance our own solutions by that.

People being new to RDF, which never or rarely came in touch with the task of interlinking with foreign data sources so far, may now think that this was a problem coming only with RDF. At the contrary! Any data retrieved of whatever other system may be right or wrong, and may be complete or incomplete. Within the RDF concept it is even literally stated that RDF-based data by definition is neither correct nor complete – well, at least if you are not the one to ensure that. But this problem may only get bigger the more you access external data sources, and once you would start to use semantic web technology to do so, it may just happen more and more often.

This puts up questions like: how securely do you know from what source a given set of data originates, how much do you know about the provider of that data source, how accurate is the data model and how much can you trust in the quality of data maintenance. Obviously the answers to these questions will have a direct impact on how much you can benefit from interlinking with the given data. If the quality of external data is not ensured, the quality of your solution in turn will suffer.

So if you link to external data yourself, you will have to pay tight attention to the quality of the used data sources, in order to avoid any bad impact on the quality of your own products and services.

More posts

Fly Me to the Moon

Data quality counts

Does RDF fit into my architecture?

Inner nerd and Semantic Web: The glory details