The centerpiece of data governance: Making information quality pay off
The unstructured divide
The most daunting prospect about the data deluge Shankar referenced is that the majority of it involves unstructured data or, at best, semi-structured data. “These days, the data explosion is on the unstructured data side, and many of the [traditional] tools don’t really work well from the unstructured perspective,” Shankar said. “When it comes to unstructured data, meaning XML structures and word documents, they’re not really that great.”
By uniting the underlying semantics of data—regardless of variation in structure, formatting, or data models—with what Shankar termed “a unified semantic layer,” organizations can reinforce information quality in a multitude of ways involving the following:
♦ Data catalogs: Virtualization options are primed for working with unstructured data because they provide an abstraction layer that unifies the semantics of diverse data types while enabling organizations to input business rules for quality standards. Central to this functionality is the role of a data catalog that offers a uniform representation of disparate data, illustrates the relationships between them, and contains their data lineage. “All the unifying functions expressed in terms of relationships show up in a much more graphical way,” Shankar said. “The data catalog is able to depict that.”
♦ Semantic graphs and common data models: Semantic graphs are increasingly sought for their ability to align semi-structured and unstructured data alongside structured data. They also support a number of functions for mapping this data to a common data model, which includes myriad measures for implementing data quality transformations. According to Lore IO CEO Digvijay Lamba, mapping standardizes data across column names and value sets, presenting the first opportunity to implement information quality. Accompanying the mapping process is the notion of mastering data via joins and merging, which also rectifies differences in data’s representation. Data cleansing can be applied in transformation libraries leveraging business rules and cognitive computing to detect, for example, “what form addresses come in and what transformations are to be applied,” said Lamba.
♦ SHACL and data validation: Shapes Constraint Language (SHACL) is a highly utilitarian means of validating data in semantic graphs that adhere to the universal standards of Resource Description Framework (RDF). In these settings, SHACL is “the standard for rules and data representation,” said Irene Polikoff, CEO, TopQuadrant. Organizations can input their various rules for data quality into this mechanism that uses them as the basis for data validation. Subsequently, Polikoff said, organizations can leverage these information quality rules with SHACL explicitly defined as part of the knowledge graph, thus validating unstructured data as readily as they do structured data.
Monetizing data quality
Poor information quality wastes monetary resources, squanders data’s underlying value to the enterprise, and significantly increases risk in the form of regulatory compliance or litigation. However, effective information quality substantially improves monetization opportunities for sales, marketing, product development, and more.
Honing in on the business logic of this data governance staple—by perfecting its technical implementations—is instrumental to capitalizing on the information derived from prudent data management. Moreover, it transforms an undertaking from one based on risk mitigation to one optimized for profitability.