August 2011 – netlabs on netlabs

Maintaining data is a challenge, and there are several things that can make maintenance a real pain. Many problems with that arise out of the patterns that we use to store our data, as they have a big impact on how easy we can retrieve stored data later, may it be for viewing or cleaning up obsolete data or other tasks.

Before computer-age, people stored data by simply writing it. Lists and tables were invented to make retrieval of stored data much easier, just think of birth and wedding registries etc. Beside that hierarchies were used e.g. in natural science to display relations between unequal things and similar things at a time. Both methods implied a limit, as lists either needed to be short enough to remain usable, or there had to be a suitable scheme to split it up in smaller parts. Hierarchies could as well not grow too big, otherwise it would have been impossible to display them on a single or at least on a small amount of sheets of paper.

Surprisingly, even since the invention of computers, the concepts of tables and hierarchies still dominate the way we store and retrieve data. Tables are used in the majority of database management systems as well as in spreadsheets, and hierarchies are used in file systems and applications to store data. And although we might expect that with a computer we should not have a problem with storing large amounts of data, somehow the limits of pre-computer ages still apply. This happens at all places where people need to access data, or require to design tables or hierarchies to suit a specific use case. Many problems arise out of that, but mostly the storage patterns are either not seen as the underlying reason, or the problems are taken as irrevocable.

Data to be held in a table, may it be in a spreadsheet or a database, may still not grow too big or complex, otherwise it cannot be stored in one table of reasonable size and/or complexity. In this case people cannot use spreadsheets or other kind of simple table views anymore, but require use case specific applications for accessing and visualizing data. A more important drawback that applies to tables of all sizes is that they are not very suitable for data exchange, when either the meaning or the formatting of data items can be misinterpreted.

Wherever users create hierarchies, e.g. in file systems or within applications, they face the problem that the hierarchy strongly depends on the logic created in it by one or a group of persons. With with increasing complexity it gets more and more difficult, if not impossible to extended it without breaking this logic.

Semantic web technology is a true game changer in that regard. Beside other advantages, it comes with the ability of linking things with any amount of other thing, instead interlinking documents like the world wide web does, or linking a set of things with another set of things, like tables do. Because of using the most fine-granular relation possible, relations do not have to fit into any other logical system, which would have to be designed for a given use case. And semantic data is not required to be stored in a hierarchy, so there is no risk of implementing today a boundary of tomorrow. As a result, the storage pattern is completely nonspecific to any use case, and can be scaled to any size and complexity.

Of course, retrieval of data stored like this is not bound to a use case as well, as no knowledge about tables or hierarchies is required. Instead data is retrieved by querying relations between things, which is far more intuitive. Interestingly this storage and retrieval pattern matches exactly how we memorize things, namely simply by association between things! Or do you open a table or a directory structure in your mind to remember what you had for lunch yesterday?

This does not mean however that tables and hierarchies are not longer required. Data needs to be displayed for to create, view and modify it in the front end, and therefore we still need the tables or hierarchies we are used to – you can take any form as a hierarchical way of displaying data. And whatever is required for that is already part of the data, because another, most important feature of semantic web technology is that the description of the data is part of the data itself.

However, for that applications need to apply the concept of tables and hierarchies only to a small part of available data, so there is no scaling problem. At the same time storage of data logically scales like never before, not hindered by any schemes that are otherwise only required for the visualization of data.

Month: August 2011

Why tables and hierarchies don’t scale