Taxonomies: Foundational to knowledge management
Aristotle is credited with the concept, Swiss biologist A.P. de Candolle with inventing the word, and Swedish biologist Carl Linnaeus with being its father. The word, “taxonomy,” is derived from the Greek: “taxis,” meaning “order” or “arrangement,” and “nomos,” meaning “law.” Taxonomy could be described as the first formal KM system.
Introduced in the 1700s, the Linnean system, the first uniform system for classifying plants and animals, is still in use today, with modifications that take into account scientific advances. Taxonomy is also fundamental to managing digital content and is essential to organizing, searching, browsing, and retrieving information. Hierarchical taxonomies are used in ecommerce, for example, to allow customers to navigate among product options to the one that meets their needs. When products are tagged with metadata, precise searches are possible.
However, the value of taxonomy as a tool extends beyond search and retrieval. “Taxonomies and associated metadata allow organizations to do many types of analyses on the use of their information,” said Marjorie Hlava, founder and chief science officer at Access Innovations, Inc. “They can quickly and easily find out what categories in a scientific journal are searched the most, for example, or how search frequency in the categories has changed over time.”
Access across many journals to find out what topics are searched most frequently in certain disciplines is equally feasible. “We conducted one analysis of institutions throughout the world to see which countries supported the most institutions,” Hlava continued. “This was done by extracting the institutions as entities and tabulating them across countries.” The analytics were possible because of the taxonomy and metadata.
Access Innovations was founded in 1978 to build databases for clients in government and the private sector. Realizing the extent to which certain aspects of database development were repetitive, Hlava developed a product for automated indexing called Machine Aided Indexer (MAI). This is a complex product with a heavy layer of natural language processing and many automated language processing algorithms.
A second product, Thesaurus Master, which assists in building structured vocabularies, including equivalents and associative terms, manages the terms and the ambiguity (disambiguation). These two products were eventually combined into MAIstro. Other modules in the suite include Recommender, which matches the user’s interests to content with similar indexing based on search hits and tagging for a significantly improved searching experience.
Disambiguation is a behind-the-scenes function that overcomes the limitations of keyword search, which does not offer distinctions between different meanings of a word. Making the term specific by putting the intended meaning in parentheses is not effective, notes Hlava. “That is not how people search,” she commented. “In addition, computers are not good at handling punctuation.” The solution is to use semantic technology, restate the concept, and then tag the content showing the particular meaning based on its context.
One of the underlying features of a taxonomy is a vocabulary control list that sets up preferred search terms and their equivalencies. Access Innovations has extensive experience in developing vocabulary control lists and the associated standards. “The average large company has five search software products on the shelf,” Hlava observed. “They do not work well because they are not using a controlled vocabulary to leverage search. If taxonomy term priority search is used, there is a 40%–60% improvement in search success.”