Three waves of information portals for KM
How will portals support access to structured and unstructured data? Corporate users need to access relevant business data and information, whether structured or unstructured, alphanumeric or text. The problem is that structured and unstructured data have been managed separately with little or no thought to common access. Moving all of the data into a single database is not always practical, nor does it in and of itself solve the problem at the access level.
Providing users access to multiple applications under the cover of an information portal also does not solve the problem. The key is providing the infrastructure to support unified access. Advances in content management, along with extensions to SQL-based queries, signal a new era of unified access to heterogeneous data. This will be the third wave of information portals in support of knowledge management.
The worlds of structured data and unstructured data today are like parallel universes:
• Separate databases are deployed to house each type of data.
• Separate product ecosystems have grown up around the separate databases to access and manipulate the data.
As a result, organizations needed to implement different sets of technology to manage their structured and unstructured data assets. And separate applications were built to leverage each database or information repository. At this stage, it was rare for an application to access multiple types of data, largely because of the complex access logic the application had to manage.
Bringing data of different types together in the same database was the principal goal of the unified database, pioneered by Michael Stonebraker with the Illustra technology later acquired by Informix. Packaging methods along with each data type (or data blade in the Illustra/Informix terminology) reduced the complexity for the database administrator and the application developer.
But applications still had to access each datatype separately by making separate calls to each datatype-specific method. Also, the unified database approach assumed that data from specialized databases would be moved to a single database. But that wasn't always practical given the size of specialized databases. It represents unified data management, but not unified data access.
Knowledge management extends traditional business intelligence in the following ways:
• integrated access to structured and unstructured data
• People: tracking and analyzing how people use information
• Process: delivering information to those who need it when they need it, building intelligence into a business process
Portals are positioned to become the means for supporting the information access and delivery required for knowledge management. However, though integrated access to all relevant data is needed, it's not being accomplished by portals today.
Consider the following query: Find all customers who, over the past month, had purchases over $1,000 and sent three angry e-mails on subject Y.
This query would be a challenge to today's portal-based searching and business intelligence tools. It's ironic that modern marketing systems can trigger personalized, outbound e-mail messages to make offers to customers, based on an analysis of their purchase patterns. But marketing systems are not able to take account of the signals communicated by those same customers via the inbound e-mails that come back.
The evolution of portals will lead to integrated access to heterogenous types of data. IDC identifies the following three waves:
• Wave 1 (1998 to 2000): user-interface-led integration
• Wave 2 (2000 to 2002): separate but equal access to structured and unstructured data
• Wave 3 (2002 to ... ): unified structured and unstructured data access
Wave 1: user interface-led integration
Corporate portals have taken the idea of consumer portals like Yahoo and Excite and adapted it for corporate intranets. These portals partition the "real estate" of the user's screen, running multiple applications side by side. The burden is placed on the user to sort any semantic inconsistencies between the meaning of information displayed in one part of the screen (via one application) and that on another part of the screen (via another application).
From the perspective of data access:
• Unstructured data: Portals enable users to search through corporate documents, primarily via full-text searching. The documents are formatted as HTML pages for display to the user.
• Structured data: Business intelligence query/reporting tools provide the capability to build reports from structured data sources. The reports are formatted as static HTML pages and viewed through a browser-based portal.
The "$1,000 customers with angry e-mail" query: Only a portion of the sample query can be handled. A sales report is browsed by a user to identify those customers who purchased over $1,000 last month. But there is no ability to identify the angry e-mails, nor to link these to the $1,000 customers.
In Wave I, there is equal access to structured and unstructured data. But the access is static, limited to viewing HTML pages.
Wave 2: Separate but equal access to structured and unstructured data
Corporate portals begin to embed more advanced features, deepening the level of access and providing better information sharing. Examples are Viador's bundling of Infoseek's search engine with their reporting capabilities, or Hummingbird's bundling their Andyne business intelligence technology with their PC DOCS technology.
From the perspective of data access:
• Unstructured data: Advances in taxonomy and classification engines make content management a hot market and enhances searching capabilities.
• Structured data: Rather than static HTML documents, users can drill into reports for more detail in the style of multi-dimensional analysis.
The "$1,000 customers with angry e-mail" query: Users can form a query to find the $1,000 customers. Taxonomy building software gets us closer to extracting concepts from within documents to find those customers who sent e-mail on subject Y, even if the topic was not included in the subject of the e-mail. Identifying emotions (e.g. the angry e-mail) may not be far off.
This is more powerful access to structured data and, especially, to text. But the access is handled separately. Information portals may be able to route a user's question to the appropriate query or searching engine. It is left to the user to put together the results.
Wave 3: Unified dtructured/unstructured data access
The advances in text mining, concept extraction and content management in Wave 2 paves the way for Wave 3. Essentially, Wave 2 builds the infrastructure for unified data access to be leveraged in Wave 3.
Where we are headed is toward a convergence of unstructured and structured data access. One sign of this future convergence is the Brio-Autonomy relationship, which promises to bring Autonomy's search and classification engine (for unstructured data) to expand the scope of Brio's business intelligence software. The results should be reflected in future versions of the Brio portal.
Another sign is IBM's ongoing Project Garlic, aimed at providing a federated search engine that could integrate (based on unified metadata) the results of structured and unstructured queries or searches. The results of this effort should emerge in various stages within IBM's DB2 database, Data Joiner and portal infrastructure software.
If content management engines can classify the concepts within a document, these data items can be stored outside the documents as accessible fields - most likely tagged in XML. These data attributes or fields can then be joined with existing customer records to form an expanded logical, heterogeneous record--ready to be accessed via enhanced information portals.
Unified data access means that this query can be handled. Moreover, unified data access should enable the aggregation and measurement of trends over time that is the regular province of multi-dimensional analysis and data mining. These attributes are candidate dimensions for examining trends in customer behavior, supplier performance, employee turnover, and the like. For example: what are the best predictors of customer churn--changes in buying patterns, changes in e-mail topics, or some combination?
But Wave 3 does not stop there. Additional data can be gleaned by the portal to track how people are using information, another area not exploited by business intelligence today. This will enable identification of experts for better collaboration, as well as smart information push to those who need the information when they need it--a fundamental goal of knowledge management. Hence, in addition to expanded access to information, Wave 3 portals will need to incorporate better support for people and process.
Conclusions
The separate worlds of structured and unstructured data access are coming together. There is a prospect for gaining more intelligence about customers, suppliers, and employees than could be gained by access to only one type of data. This also implies a shakeout in the vendor ecosystems that have grown up around each type of data. The result will be portals that provide wider, deeper, and more collaborative business intelligence.
Dr. Henry Morris directs research on knowledge management, data warehousing, and analytic applications software at IDC, a worldwide industry research firm with headquarters in Framingham, Massachusetts. He defined the analytic applications concept and writes about the convergence of business intelligence and knowledge management. His e-mail address is hmorris@idc.com.