Data quality goes mainstream
A plethora of data quality options
Once an organization has recognized the value of its data and a commitment has been made to improve quality, a variety of solutions can be implemented. Some data quality tools are integrated with applications such as product information management systems. Others are standalone products that are dedicated solely to data quality. Some are focused on a particular vertical such as pharmaceuticals or HR. Still others provide a suite of data management solutions, with data quality being one part. Data preparation software, for example, may include data quality as a component.
“Verification can be as basic as validating a date or ZIP code format, or determining whether a Social Security number is correctly structured for the United States versus Canada,” said Jitesh Ghai, senior vice president and general manager, data governance and privacy, Informatica. “Our software can also look up an address to see whether it exists, and can also do more sophisticated things such as using machine learning to develop a confidence score on whether an individual in one data source is the same one as referenced in another.”
Informatica has a long history in data integration, and is now focused on enterprise cloud and hybrid data environments. The company also expanded into master data management (MDM) in recognition of the fact that data is often fragmented among multiple systems. “To have a comprehensive view of the customer,” continued Ghai, “business users must be assured that the data is complete, consistent, and timely.”
Enabling visibility
As an example of how data quality ripples through an organization, Toyota North America delved deeply into KPIs that were used in different departments. “Each one was calculating KPIs based on a different set of underlying data,” Ghai commented. “Data scientists were working to understand customer preferences so that the company could design the cars that customers wanted. Data quality was basic to this process.”
Another gap was that the sales department did not have access to marketing or manufacturing data. Informatica helped them build a data lake to provide visibility across departments. “For example, manufacturing was unaware of the impact of the subprime housing crisis on sales,” noted Ghai. “Therefore, they did not adjust their production volume to account for the drop in sales of new cars. Data completeness and timeliness were vital in this context.”
Identifying data quality as an issue
Companies that seek help for a business issue do not necessarily initially see data quality as an underlying problem. “When we are approached to provide an analytics solution,” said Jake Freivald, VP of product marketing at Information Builders, “they may not be aware that they have a data quality issue. If it turns out they do, they need to address that first.”
Information Builders’ platform emphasizes three primary capabilities: integration, integrity, and intelligence, with the goal of producing actionable analytics. Its Omni-Gen software solution provides real-time data standardization, cleansing, and remediation so that data quality issues can be detected and resolved, in order to provide the so-called golden records that allow for valid analytics.
Information Builders has developed a vertical solution for the healthcare industry, Omni-HealthData that is being used at St. Luke’s University Hospital in the mid-Atlantic area, as well as in other health networks in the U.S. and Canada. The St. Luke’s implementation began with a vision of strategic business priorities and then focused on what analytics would be required to achieve them. The organization determined what data would be required in order to ensure quality care and a positive patient experience, and to meet the challenges of value-based reimbursement and cost containment.
Combining automation and human intervention
Data from 31 different systems was cleansed and standardized to reconcile demographic information from patients. Duplicate records were eliminated and errors were corrected. The resulting data allowed accurate reporting and analysis, and data quality is continuously improved through St. Luke’s data governance program.
A data quality solution should combine both automation and human intervention; on the one hand, scaling up cannot be done manually, but on the other hand, there are judgment calls that require human knowledge and insight. “We leverage human and AI input” said Freivald, “to enable business users to make recommendations about data quality rules, and then also surface likely rules to automate intelligent recommendations about what rules should be in place. Machine learning is about trying to surface data quality rules so they can be used as part of the larger enterprise model.”
“It is hard to get a budget for data quality,” explained Freivald, “but when the data quality is directly and explicitly tied to a business strategy, people see its importance.” In some cases, Freivald advised, it’s a good idea to start small and get a high-visibility success, and then build out. “Data quality is not an engaging topic on its own,” he added, “but when it is put in a meaningful business context, it is easier to get people on board.”
As organizations continue to recognize the value of data as an asset, the market for data quality solutions will grow. Gartner reported that the market for data quality software tools reached $1.61 billion in 2017 (the most recent year for which the company has data), an increase of 11.6% over 2016. Given the proliferation of customer data, medical data, Internet of Things streams, and other sources, this market growth is likely to accelerate in coming years as businesses support new initiatives for data quality.