The centerpiece of data governance: Making information quality pay off
Information quality is central to effective data governance. It’s both the core foundation—and optimal output—of the people, protocols, and processes necessary to maintain IT systems. Its contribution to data-driven processes is inestimable: Without quality data, such information is useless for achieving any business objective.
The most pervasive information quality metrics include the following:
♦ Completeness: Quality information has all fields filled out and has no missing elements.
♦ Timeliness: This dimension of information quality ensures users are accessing the latest and most temporally meaningful data for deployments.
♦ Uniqueness: Creditable information doesn’t contain costly duplications that potentially obscure its significance to enterprise use.
♦ Accuracy: This metric focuses on the overarching correctness of information, which should be errorless.
♦ Consistency: Quality information is consistent in how it is represented. For example, dates appearing in a six-digit format for month, day, and year—in that order.
♦ Validity: Validity refers to data’s propensity to conform to specific requirements, helping decrease data’s potential to become obsolete over time.
♦ Lineage: Lineage provides details of data’s journey through the enterprise from initial ingestion or creation. Often found in the form of metadata, it includes aspects of transformation and previous uses of datasets that illustrate “how the data has evolved; how it has changed over time,” explained Ravi Shankar, senior vice president and CMO, Denodo.
According to Jean-Michel Franco, senior director of data governance, Talend, these technical characteristics “represent the features of data that can be assessed against to measure the quality of data,” which is typically judged by its risk reduction in areas of regulatory compliance, analytics, and data science.
Nonetheless, there is a decisively business-oriented aspect of information quality that is less discernable, yet far more lucrative to the enterprise. By focusing on what Jitesh Ghai, senior vice president and general manager of data quality, security, and governance, Informatica, termed “the business logic” of information quality, organizations can drastically increase mission-critical functionality, conversions, and—most significantly—profit margins.
Business logic
From a business perspective, the technical facets of information quality identified earlier are only as effectual as they pertain to commonplace business objectives such as sales, marketing, and others. It is therefore incumbent upon organizations to “convert these broad, different technical definitions into a common business definition,” Shankar observed. The technical aspect of information quality might pertain to data’s adherence to the way that a specific business (with multiple subsidiaries, such as Citi, for example) is represented in IT systems. In this use case, one is assessing information’s presentation in columns and tables. But when viewed through the business logic lens, it becomes necessary to “extract data away from where it is,” Ghai noted. “You’re just looking at a business entity, like a contract, and you’re determining the validity of that by applying data quality business rules.”
Those rules—and their effects—can be as basic as ensuring a contract’s starting data is before its ending date. In other instances, they can be as profound as stipulating standards for data completeness that significantly improve sales revenue. Ghai cited a use case in which CVS Pharmacy used “data quality business logic to help them better and more accurately price their generic prescription drugs. Some were underpriced, others were overpriced. Data quality helped them get to a more optimal price elasticity point, which resulted in a $300 million increase in revenues.”
Data profiling
Such astronomical monetization gains attributed to the business logic of information quality stem from its technical dimensions. The CVS use case “came down to the completeness in calculation of the data they were relying on to price their product,” Ghai explained. Although there are numerous means of effecting data quality, nearly all of them begin with data profiling, which produces statistical information about data. “You need to know the health of your data, the quality of your data, across the dimensions, and that’s what profiling is doing with the discovery,” Ghai commented. “Once you know that, it’s like a report card: you know what to remediate.” Data profiling is integral to the data discovery process begetting information quality measures, which also include the following:
♦ Defining business rules: Business users should have input into the specific standards applied to the different categories of information quality, such as acceptable metrics for timeliness or the specific format that will be used for representing locations, for example. Shankar cited a use case in which “some people spell out United States, but you want it to be U.S.A. So you can actually define the rules very specifically to standardize the data to a certain level of quality.”
♦ Applying data cleansing transformations: Once these definitions are created with relevance to business objectives, organizations must apply them via data cleansing to the areas of the data necessitating remediation. Data cleansing is the process of detecting data that is below a certain quality level and then removing or correcting that data, Franco specified. “Correcting is about applying the rules for transforming the data.”
♦Measuring and monitoring present and future results: Information quality’s business value is inherently numeric, as is this facet of data governance itself. Subsequently, the final step is to “measure and monitor to ensure over time your quality only goes up; it doesn’t degrade,” Ghai explained. “And if it degrades, then you can be alerted and continue this process.”