Taxonomy 101: The Basics and Getting Started with Taxonomies
Why Create a Taxonomy?
The information is there. The challenge is to help executives, analysts, sales managers, support staff, and customers find and use the right information efficiently and effectively. Many enterprises extract value from the business information they accumulate by organizing the data logically and consistently into categories and subcategories, creating a taxonomy. When information is structured and indexed in a taxonomy, users can find what they need by working down to more specific categories, up to a more inclusive topic, or sideways to related topics. This article discusses taxonomies: what they are, how they are used, and the enterprise software used to manage them.
Taxonomies for the Business Enterprise
Enterprises collect enormous quantities of information that can significantly improve all aspects of a business, from forecasting and decision making to sales and customer service.
These benefits can be realized, however, only if people are able to find and make sense of the right information when it is needed. Both customers and employees can better access and use needed information when it is organized into categories and subcategories within a taxonomy. A taxonomy can also improve search results by showing the levels immediately above, below, and adjacent to the search term in the hierarchy, providing both a meaningful context and ideas for further exploration. Some taxonomies also list synonyms or preferred terms and automatically expand searches to include equivalent terms
Multiple independent taxonomies, or facets, can be overlaid to provide different views into the same data. For example, a database of music could have separate facets organized by genre, year created, ethnicity, and record label. Facets also allow information to be labeled and organized differently for various groups, such as customers, sales staff, support staff, and scientists.
Several tasks are involved in maintaining a taxonomy of business information. A business must first determine a suitable structure for the data it has or will accumulate, and then assign each piece of content a place in the structure. To help users make the best use of this information, the taxonomy must then be integrated with other business systems. Continued maintenance will also be necessary, both updating the taxonomy to keep it relevant, and classifying new information as it is added.
Many businesses do not catalog their data consistently, either because the process is too time-consuming or because more immediately urgent tasks intervene. And even when a business does address this issue, it can be difficult to ensure that all the data is included and categorized consistently across business units, in a way that will be logical and understandable to its intended users.
Taxonomy management software can be used to reduce the time, labor, and potential inconsistencies involved in creating, implementing, and maintaining a taxonomy. With such software, a business can import, convert, merge, and modify existing taxonomies, and also automatically generate taxonomies to custom-fit its data. Taxonomy software can analyze a text and automatically assign it to a place in the taxonomy, with the option for users to manually override or modify the resulting classification. Taxonomy software can also integrate with or send output to content management, portal, and other enterprise management systems. It can even streamline workflow within a business by enabling automatic routing and responding for documents, emails, and customer interactions, based on their content or other characteristics. Taxonomy management software today is increasing in power and complexity.
Taxonomies Make Information Accessible
Just as a library would be of little use if it failed to organize and catalog its books, so accumulated business information provides little value to an enterprise unless it is organized into a logical, consistent framework for retrieval or analysis. Poor information management also reduces productivity. As was shown by several independent studies summarized by KMWorld magazine (www.kmworld.com):
- Knowledge workers spend 15 to 35 percent of their time searching for information
- 40 percent cannot find the information they need on their corporate intranets
- 15 percent of their time is spent duplicating information that exists but cannot be found
A taxonomy is a good way to make accumulated information accessible and usable. When information is organized and indexed in a taxonomy, users can find what they need by starting with a general topic at a high level, and then drilling down through subcategories to find more specialized information. They can also use the taxonomy to explore by moving from a specific topic up to a more inclusive topic, or sideways to related topics, even if they are not sure what they are looking for.
In addition, a taxonomy makes searching for information easier and more effective because search results can show the levels immediately above, below, and adjacent to the search term in the hierarchy, providing context as well as ideas for further exploration. Some taxonomies use a numerical index that is shorter and easier to work with. The Dewey Decimal and U.S. Library of Congress systems are examples of numerically indexed taxonomies.
Getting Started with a Business Taxonomy
Several tasks are involved in creating, applying, and maintaining a taxonomy of business information. A business must first create the taxonomy framework by determining a suitable structure for the data it has accumulated or will accumulate. This structure should capture the relationships inherent in the body of information in an intuitive way, as well as reflecting how the information fits into the overall structure of the business. The taxonomy framework will then need to be updated regularly to remain relevant and useful as new information is incorporated and as changes occur in terminology, technology, and markets.
Once the categories have been determined, a business must populate the taxonomy by assigning each piece of content a place where it belongs. After an initial large-scale classification or reclassification of existing content, there will be an ongoing need to classify each new piece of information. Classification can be immensely time-consuming, not only because of the ever-increasing volumes of information to be catalogued, but also because of difficult and possibly controversial decisions concerning where to place items that could go in more than one category.
Once a taxonomy is created and populated, it must be integrated into the business, to improve users' ability to find and make sense of the information they need. This often includes helping them find the right information when they are not sure what they are looking for, and perhaps are not aware of what information is available. Another benefit of a well-designed taxonomy is to help users be confident, when a search fails, that the looked-for information is really not there, so they can look elsewhere instead of continuing a fruitless search.
Enterprise Taxonomy Management Software
To reduce the time expenditure and improve the consistency of their information management and classification processes, many business use taxonomy management software, which can help with creating, implementing, and maintaining taxonomies. While vendors label their products by different names - including business semantics modeling, knowledge organization system, controlled vocabulary, thesaurus, ontology, and metadata model - they have enough similarity to all be categorized as enterprise taxonomy management software. Table 1 lists some vendors of this software genre, along with their products.
Table 1. Enterprise Taxonomy Management Software Vendors
Vendor | Product |
Concept Searching | SharePoint conceptual metadata generation, auto-classification, and taxonomy management |
CuadraSTAR (Lucidea) | STAR/Thesaurus - Thesaurus construction |
Data Harmony | Thesaurus Master - Taxonomy and thesaurus construction and management |
Mondeca | ITM (Intelligent Topic Manager) - Taxonomy and ontology creation and management |
PoolParty | Thesaurus Server, Extractor, Semantic Search, Power Tagging - Text mining, data integration, thesaurus and taxonomy management |
Smartlogic | Semaphore Enterprise Semantic Platform - Classification, text mining, and ontology management |
Soutron | Thesaurus construction |
Synaptica | Synaptica Enterprise - Taxonomy management |
Wordmap | Wordmap - Taxonomy management |
Importing Existing Taxonomies
Most taxonomy management software allows users to import, convert, and modify existing taxonomies. These could be from databases or classifications that a company already maintains, or published third-party taxonomies, which can be found for many business, medical, scientific, engineering, and public policy topic areas. Often the vendor of the taxonomy management software includes or makes available a selection of predefined taxonomies, which can then be synchronized to create a single enterprise-wide taxonomy. Once the data have been imported and converted, responsibility for subsets of the taxonomy can be distributed to relevant subject matter experts within the company for further customization.
Automatic Taxonomy Generation
Some software automatically generates a taxonomy by using natural language processing and statistical clustering to analyze the topics and subtopics found in a company's documents, without human analysis. For example, Wordmap offers text mining and automated categorization tools. Taxonomy software can also review existing categorized content and suggest new categories to be added. The suggested taxonomy may then be manually adjusted, but beginning with automatic generation can provide a big head start on the process.
Automatic Classification
Once a taxonomy has been created, taxonomy software can use natural language processing, semantic analysis, and statistical pattern matching to analyze each body of text, and then assign it to a place in the taxonomy by attaching a metadata tag. An option is always provided to override or modify the resulting classification, and problematic documents are automatically set aside for manual classification.
Integration and Application of Taxonomy Software
Taxonomy software can be either a standalone system or a module of a complete information storage and retrieval system. Most standalone systems can integrate with or send output to content management, portal, and other enterprise management systems. Storing the taxonomy shell, or definition, separately from the content allows various applications and groups to share it.
Maintenance of the Taxonomy and Database
Taxonomies require regular maintenance in order to remain relevant and up-to-date. New content must be incorporated into existing or new categories. To keep the taxonomy current in terminology and associations, subject matter experts within the company can be assigned responsibility and security authorization to maintain the sections where they have expertise. To help maintain internal consistency, the software can automatically update the other half of a reciprocal relationship when a change, addition, or deletion occurs. Reports and graphical representations of hierarchies and maps (with drag and drop interfaces) can also aid taxonomy maintenance. For example, reports can show broken links, linkless nodes, and who made what changes.
The Business Case for a Taxonomy
New technology is constantly being developed to help users manage and make sense of their vast and increasing information resources. Taxonomy management software in particular is becoming more powerful as it gains the ability to represent the complex, specific relationships of a thesaurus or ontology.
There is no question that a business must keep its information organized and accessible; however, implementing a software-based taxonomy management system may not be appropriate in every case. A taxonomy is, after all, really nothing more than a filing system. And some businesses may be functioning well enough with their current filing methods and not benefit enough from a taxonomy management system to justify the cost of implementation. Nevertheless, given the accelerated pace at which information currently accumulates, most companies would benefit from implementing at least a basic electronic taxonomy.
Implementing a taxonomy can improve the navigation of a company's website and help customers find product information faster and more reliably, leading to increased sales and better customer relations. A Forrester research report found that "poorly architected retailing sites" sell only half as much as better sites. And it is very important to help users find information on the first try: in one study of users whose searches failed, 47 percent gave up after just one search, and only 23 percent tried three or more times. Another study of e-commerce sites showed that users find desired information only 34 percent of the time with a simple search, but 54 percent of the time using a taxonomy.
Internally, a taxonomy can improve productivity and customer service by helping employees find information faster and more reliably. By establishing a common terminology and structure, a taxonomy can also improve communication among various employee groups, as well as with customers. Finally, the taxonomy can itself become a valuable resource representing the company's accumulated knowledge.
Nevertheless, the efficiency gains from a taxonomy management system can in some situations be offset by its labor-intensive setup. If a business does decide to implement a taxonomy management system, it should not underestimate the effort that will be involved in creating and maintaining a taxonomy. A system that offers labor-saving options such as importing existing taxonomies, and automatic taxonomy creation and document classification, may be worth paying for. And even if only a simple taxonomy is currently needed, it would be wise to choose a system that has the flexibility to grow with the business, and that will be compatible over time with the trend toward richer representations such as ontologies and topic maps.
Web Links
Concept Searching: http://conceptsearching.com/
CuadraSTAR (Lucidea): http://cuadra.com/
Data Harmony: http://www.dataharmony.com/
ISO: http://www.isotopicmaps.org/
KMWorld magazine: http://www.kmworld.com/
Mondeca: http://www.mondeca.com/
OASIS: http://www.oasis-open.org/
PoolParty: http://poolparty.biz/
Smartlogic: http://www.smartlogic.com/
Soutron: http://soutron.com/
Synaptica: http://www.synaptica.com/
Wordmap: http://www.wordmap.com/
About the Author
Betsy Walli completed a Ph.D. in linguistics at the Massachusetts Institute of Technology, as well as a masters degree in counseling at California State University, Fullerton. Dr. Walli is an independent writer and editor with experience in academic, technical, and marketing writing.
This article is based on a comprehensive report published by Faulkner Information Services, a division of Information Today, Inc., that provides a wide range of reports in the IT, telecommunications, and security fields. For more information, visit www.faulkner.com and www.infotoday.com.
Copyright 2014, Faulkner Information Services. All Rights Reserved.
Companies and Suppliers Mentioned