Getting data in the hands of knowledge workers
Adaptive data marts and customer data integration strategies emerge
By Kim Ann Zimmermann
Hundreds or even thousands of knowledge workers continually culling a data warehouse for even the most basic information can really put a toll on a company's computing power. As new groups of workers develop knowledge about data and want the ability to access more information and manipulate that data into a number of configurations, it becomes a nearly impossible task for the data warehouse to churn out all of the necessary results. That's why more and more companies are seeking flexible approaches to structuring their data warehouses so that users can get to the information they need without clogging the pipelines.
One way that database managers are dealing with the need to provide more information to users without creating storage and data access nightmares is through the use of adaptive project marts. Those marts allow users to generate reports and manipulate data to their heart's content without modifying the physical database.
However, there are times when information must be loaded into the data warehouse, and that task is becoming easier as a trend grows toward convergence of extract, transform, load (ETL) tools and enterprise integration application (EAI) systems. The union of those two technologies provides the ability to extract data from EAI software and load it directly into the data warehouse.
Customer data integration is also a hot button among data warehouse managers and users as companies identify the need to achieve a single view of their customers across product lines and channels, as well as geographies and business lines. To achieve true customer data integration, customer records have to be brought together from repositories across the enterprise.
All of those demands on the data warehouse come at a time when more and more information—not just word processing files and spreadsheets—are being made part of the enterprise information repository. XML files, video clips, audio clips and large graphic files are creeping into the data warehouse and eating up precious space.
Beyond the repository
"Historically, data warehouses have been repositories for customer information and transaction information," says Jay Desai, co-founder and practice area leader at Knightsbridge Solutions, a systems integrator specializing in data warehousing. "Enterprises are facing an increase in unstructured and semi-structured data including images, audio, video, XML."
As a growing number of RDMS vendors, including Microsoft and Oracle, add support for different data types, many ETL vendors do not, according to Desai. That will hamper the ability to integrate different data types into current data enterprise architectures, he says.
As data warehouses must incorporate various data types, they also must get closer to real time to analyze and react to information. But it is not efficient to have several repositories going at once—one for historical data and the other for more current information. To address that issue, Teradata's Teradata Warehouse 7.0, for example, consolidates all decision support databases into one central, enterprise data warehouse, bringing the data marts, operational data stores (ODS) and analytic servers under one roof.
"We're really going to see things get closer to real time in the next generation of data warehouses," Desai says. He points out that companies in the retail business, for example, must have the ability to react quickly to prevent fraud and respond to changing business climates. If someone makes a purchase with a stolen credit card and returns the item to another store the same day, there would be no way of detecting the fraud if the point-of-sale data was uploaded to the data warehouse only once a day.
The need for real-time demands can pace a hefty burden on the data warehouse, which is why compression tools are becoming a more common part of the data warehouse structure.
"The real-time data has to be dynamically compressed and secured on the fly," says Bob Zurek, VP of advanced technology for Ascential Software. "As we continue in a challenging economy, everyone is trying to do more with less. Companies want to extract as much data as they can so that they know who is their best customer and who are their worst customers, and they can reward the good customers and fire the ones who aren't profitable. By applying compression techniques, you're making more efficient use of the existing data warehouse structure."
Among the biggest storage hogs in data warehouses are XML files, which can't be easily extracted, Zurek says. However, that issue should be dealt with in the next generation of database management tools, he adds.
Don't rock the structure
The need to keep data warehouses as streamlined as possible is also giving rise to a strategy that Sagent calls adaptive data marts. "People want to build a data warehouse and not have to change it for a while," says Dave Henry, VP of product marketing for Sagent. The problem with that strategy arises when the needs of users changes.
"We're seeing an emergence of a new group of users knowledgeable about data, and they want to work with the data at a granular level," he says. "They want to take advantage of the transformation and calculation capabilities historically used in an ETL environment."
Adaptive data marts allow users to perform calculations on the fly without having to make changes to the database. "Say you have a university with a large endowment," Henry says. "The financial analyst for the university wants to analyze how the investments are performing. The financial analyst doesn't want to modify the database necessarily. The benchmarks he's created may eventually become part of the data warehouse, but with the adaptive data mart strategy, the database doesn't need to be modified unless these calculations are something that need to be an ongoing part of the structure."
Data marts for specific functions, such as marketing, have evolved over the past few years because of the financial and technological constraints forcing the need to squeeze as much as possible out of existing data warehouses. The data marts are being consolidated, however, to improve information integrity and reduce support issues.
Data integrity issues
While data marts can meet the needs of specific groups in an organization, "there has to be a single version of the truth," says Neil Patil, VP of product marketing for Brio Brio. One of the biggest problems Patil sees is that data is being dumped into the data warehouse without being cleansed or checked for integrity, which has a trickle-down impact throughout the organization as the data is distributed. Data integrity is becoming an even more critical issue as companies are integrating information from outside the organization into their data warehouses--i.e., a consumer goods company feeding point-of-sale data from retailers into its data warehouse.
"Unfortunately, we're seeing companies dump data into warehouses without cleansing it because they are still getting some efficiencies from having a central repository," says Patil. "Data integrity is an important topic as these data warehouses continue to expand within an organization and are used for all kinds of decision making at various levels."
Kim Ann Zimmermann is a free-lance writer, 732-636-3612, e-mail kimzim2764@yahoo.com