-->

Keep up with all of the essential KM news with a FREE subscription to KMWorld magazine. Find out more and subscribe today!

The Semantic Web adds logic to Web services

Promises improved database interoperability

By Katherine C. Adams

Digital information is unwieldy. Despite the marketing rhetoric of search engines, online navigation guides and assorted enterprise software companies, the Web does not usually function as a content or knowledge management platform. It is difficult to find, sort and catalog digital data. For most consumers, the Web is a barely adequate information retrieval tool or a medium for e-commerce. The Web cannot help users answer complex questions or perform many day-to-day tasks. In short, the Web is a poor KM tool. The limited functionality of Web services and products is rooted in the inability of computers to understand semantics. The current Web computers don’t process data based on what it means. That is, HTML specifies how information should appear but ignores the meaning or significance of that data.

Tim Berners-Lee and his colleagues at the World Wide Web Consortium (W3C, w3c.org/2001/sw) are addressing the limitations of the current Web. They call the next stage of Web development the Semantic Web. While still in the initial stages of development, the project entails adding an additional layer of Web infrastructure to HTML pages. The key idea behind the Semantic Web is augmenting HTML Web documents with metadata and rules of logic. The resulting infrastructure helps computers understand Web data in the same way that humans do. Adding logic to the current HTML Web allows computers to make decisions, form inferences and respond to questions.

By giving digital information more semantic definition, people are able to use the Web in new ways. The Semantic Web involves computers talking to each other for the purpose of solving problems for users. It will perform small, daily tasks for consumers. Many imagine a natural-language interface that permits users to communicate naturally with computers.

For example, a user could type, “I like the black-and-red belt by Emilio Pucci that appeared in his Spring 2002 ready-to-wear line. Where can I find the belt online and in Detroit, Michigan?” An intelligent agent would whisk out into the Semantic Web and retrieve the information. That would include links to e-commerce sites where the belt was for sale and Detroit store addresses.

A closer look

The Semantic Web is important, however, because the ramifications of the new infrastructure extend far beyond facilitating e-commerce. The effective use of machine-readable knowledge would make the Web a part of daily life. Below is a list of questions the Semantic Web could answer:

  • What is the cheapest hotel in Vienna within two miles of the State Opera House?;

  • List all Mexican restaurants within the city of Los Angeles that specialize in Margaritas and have an award-winning chef on staff.;

  • I have a scrape on my elbow that won’t heal. What could be causing this, and list the treatment options for each possible diagnosis?;

When a user asks a question, an intelligent agent accesses the new Semantic Web infrastructure to retrieve information. In answering questions such as those listed above, computers make comparative decisions and respond to questions by accessing structured hierarchies of metadata and their associated rules of logic. Computers will determine the meaning of Web data by following hyperlinks to definitions of vocabulary terms and rules for reasoning about key words. The answer is then delivered back to the user, or the requested task is then carried out.

The Internet is a worldwide system of networks, but database interoperability--the exchange of information between data repositories--remains problematic. Due to protocols such as Z39.50, technical interoperability is a less significant problem. Yet in terms of meaning or semantic content, information on the Internet remains fragmented. The Semantic Web organizes information from disparate sources into a context that people and business can readily use. The Semantic Web holds so much promise because it helps users identify and reconcile relevant content found in two or more separate sources.

The Semantic Web is composed of four technologies

Many of the core standards and technologies that make Web content more accessible to machines are in place: XML, RDF, ontologies and intelligent agents.

  • XML: Extensible Markup Language permits users to create custom-built tags. While XML serves a variety of functions, for the Semantic Web its most important job is adding semantic information to digital documents. XML addresses some of the inherent weaknesses in HTML. While HTML specifies document format, XML uses personalized, customized tags to define the meaning of information. For example, XML allows a Web publisher to label parts of an e-book as “Acknowledgements,” “Footnotes,” “Introduction” or “Summary.”;

By contrast, HTML only offers tags that indicate “Italic Font,” “Paragraph,” “Emphasis” or “Strong.” The advantage of XML is that a software program could read that text as an electronic book and perform specialized operations such as extracting bibliographic information.

  • RDF: Resource Description Framework is a data model. It offers a consistent framework for metadata and can be written using XML tags. RDF provides a structure that, in functional terms, expresses the meaning of Web documents in a way computers can understand. RDF technology results in rich descriptions of digital information. An RDF description can include all kinds of metadata such as the authors of the document, the date of its creation, the name of the sponsoring organization, intended audience, subject headings, etc. ;

  • Ontologies: Ontologies sit on top of the RDF framework and are a critical part of making the Semantic Web “intelligent.” Ontologies allow computers to communicate with each other by providing (1) a common set of terms--vocabularies--and (2) rules that govern how those terms work together and what they mean. Ontologies define terms and then lay out the relationships among those terms.;

In the Semantic Web, computers understand the meaning of Web data by following links from Web pages to topic-specific ontologies. The meaning of vocabulary terms or XML tags used on a particular Web document would be defined by hypertext links from that page to a topic-specific ontology. For example, ontologies offer cross references so a computer understands that “blouse” and “dress shirt” are the same concept. The infrastructure and semantics provided by ontologies make it easier for databases to talk to each other.

While the W3C has sponsored the development of XML and RDF technologies, building ontologies to cover every topic addressed on the Web is an enormous challenge. The Semantic Web calls for ontologies that cover everything from factory automation to post-structural philosophy. The Dublin Core Metadata Initiative (dublincore.org) has been working since 1995 to build vocabularies that could overcome that potential bottleneck.

  • Intelligent agents: They are software programs that process information without direct human supervision. Intelligent agents typically gather, sort and process information found on the Web without human interaction. According to Berners-Lee, the real power of the Semantic Web will be realized when people create programs that collect Web content from diverse sources, exchange information with other programs and deliver answers/perform tasks for users . Intelligent agents are the visible work horses in the Semantic Web. ;

Business models and VC interest in the Semantic Web

How businesses build products and services around the proposed Semantic Web is an important question facing VC firms and entrepreneurs. A number of start-up software companies claim to sell Semantic Web-compliant technologies, yet the viability of this new vision in economic and business terms is uncertain.

Avron Barr and Shirley Tessler, principals of Aldo Ventures, a firm that specializes in studying the software industry, argue that “the main business problem is of the chicken and the egg variety: The value of machine-readable knowledge is very low in the beginning , but gets much higher (rapid return on investment) when more participants put their knowledge into explicit, machine-readable form. It’s hard to identify low-hanging fruit, products that offer ROI on their own.” Aldo Ventures is sponsoring a study of the Semantic Web called “Knowledge for Machines” and can be accessed at: aldo.com/kfm.

In short, business leaders are thinking through ways to commodify the Semantic Web. In addition to working out the possible markets, business models and services for the Semantic Web, entrepreneurs are struggling with a variety of nuts and bolts issues: Who will decide on the content of the ontologies/vocabularies, how will they be maintained, who will shoulder the cost?

For more information on business aspects of the Semantic Web, visit The Business Special Interest Group of the Semantic Web Community Portal.

Katherine C. Adams is an information architect and free-lance writer, e-mail katadamsus@yahoo.com.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues