Semantic web

Written by David Tebbutt in February 2005

Looking forward to early retirement? If the techno whiz kids in and around the semantic web have their way, you won't have long to wait. I mean, what is the point in specialising in metadata and taxonomies and all that good stuff when the perishing computers and, to a lesser extent, their users are going to be doing all the hard work for you in future.

It was bound to happen. There aren't enough human beings in the world to keep tabs (or should I say tags?) on the information explosion on the visible web and the dark web, which is reputedly five hundred times bigger.

We're seeing greater sophistication of brute force search engines and clustering systems from one direction and increasingly clever automatic classification and coding systems from another.

The whole lot will converge, in theory at least, in the semantic web. This version of the web can be read and understood by machines. We're already seeing glimmerings of it with RSS feeds which enable an aggregator or, indeed, any other program to figure out what's in a web document. In future, the descriptive elements will be much richer, embracing ontologies so that meanings and relationships can be understood electronically.

In order to work properly, these information retrieval, blending and delivery systems will need to know about the user, so that the material served up is likely to be the most relevant for their needs. The dream is for the systems to arbitrate between different sources of information and draw conclusions on the user's behalf. "I know you asked for this and that, but I figured out that you'd probably be interested in this as well. And, by the way, I threw some other stuff out because it lacked credibility." A sort of Amazon book recommendation on steroids.

This all sounds somewhat idealistic. How much personal information will people willingly provide to the system? How much automatic profiling will they stand for? What if the focus of their work changes? What if the system cannot resolve conflicts in the metadata or ontologies? Where does this leave the concept of quality information? None of these issues is easily addressed but the researchers are optimistic. Aren't they always? "If we could just have a little bit more funding, we should be able to crack this."

One group in particular, called SEKT (Semantically-Enabled Knowledge Technologies), is very optimistic. It's getting two thirds of its funding (€7.5M ) from you and me via the EU's 6th Framework Programme. Its approach is to try and make the computers work more like humans. If it can crack that problem, then you'd better start checking your pension fund.

The SEKT project is due to complete at the end of 2006. It has set out to blend three disciplines: Human Language Technology, Metadata Technology and Knowledge Discovery and Ontology. The end result should be computer systems which understand what they're reading and build ontologies and metadata on the fly. The aim is for these systems to be embeddable into everyday software both for the encoding and retrieval of information. The users themselves will be asked to resolve ontological conflicts.

Trust will be an important component of the systems, with digital certificates playing their part in guaranteeing the authenticity and integrity of the material being processed.

If successful, these systems will deliver the best combination of publicly available information in the correct context to meet the users' immediate needs. In doing so, they would appear to make many information professionals as relevant as the man with the red flag who walked in front of early motor cars.