The importance of information governance and privacy
Such metadata provides the context by which data are understood and applied to varying governance purposes, including data privacy and regulatory compliance. According to Vogt, top governance platforms “orchestrate policies for access control using the metadata model that people already have. Or, using other capabilities to discover and classify data, or plug into lineage systems—things like Snowflake has just come out with—so policies can follow data based on metadata.”
Data catalogs assemble and curate metadata about ontologies, taxonomies, data provenance, and glossaries. Data catalogs are a “dictionary [that] allows you to say, ‘These are 16 different variations of how a phone number might be entered,’” said Rajiv Dholakia, Privacera SVP of products. Data catalog benefits include:
♦ Ease of access: Catalogs should provide a central locus from which to access data according to governance requirements. “It’s like going to a grocery store, or Amazon.com, and letting me click on something and have next-hour delivery of the data that I want,” Estala said.
♦ Data classifications: Tags and classifications of data assets are stored in catalogs as metadata. Tagging data is critical for preserving data privacy. It solves the initial problem in which “the C Suite doesn’t know where its sensitive data is stored,” Ghangor-Cloud COO Bhanu Panda said.
♦ Data context: The contextualization intrinsic to adaptive data governance is reinforced by catalogs’ metadata descriptions. “Data’s context should drive usage context,” Vogt said. “You might need to start having metadata around what was the context or pretense under which data was collected. That should all drive policy and access control decisions.”
♦ Distributed data architectures: Data catalogs are enablers of contemporary paradigms involving a data mesh or data fabric. The former is gaining credence in governance circles by allowing “people to create data products in internal business units,” Dholakia specified. Those products are then accessible to a broader array of users and purposes, while still incorporating “some level of centralization to enable these enterprise views,” Hawker added. Thus, governance occurs parochially and centrally to adapt to the different contexts that data are employed for by a growing user base.
Data discovery oftentimes initiates access control measures for regulatory compliance and data privacy. It’s equally valuable for locating apposite data for specific analytics necessities, such as customer segmentation supporting targeted sales opportunities. This facet of data governance is equally applicable to data protection, business intelligence, and data science.
Sensitive data discovery precedes classification (based on metadata) and obfuscation at an attribute level. Consequently, the surrounding dataset is still useful for organizations to “share this data with parts of the company that need to make sense of this data for analytics, marketing, and whatever other commercial purposes have been envisioned,” Dholakia said. Competitive data discovery characteristics include:
♦ Object identification and classification: There are point-and-click mechanisms that let users create data objects (like credit card numbers) for regulations, for example. “You can construct any object of reference that you’re actually looking for,” Panda explained. “It doesn’t have to be just purely PCI or PII or PHI.” According to Ghangor-Cloud CEO and CTO Tarique Mustafa, “The algorithm can identify and classify objects made known to the system via a priori knowledge engineering, or it can discover new objects existent in the system on its own.”
♦ Regular expression matching: This technique is another symbolic approach to data discovery, as opposed to statistical ones like machine learning. “You could have a specific compliance-oriented collection set of dictionaries that say, ‘This is what constitutes compliance within this compliance regime,’” Dholakia said. That information becomes the basis for discovering, classifying, and obfuscating data.
♦ Statistical techniques: Statistical data discovery methods include data profiling and machine learning to find germane data.
♦ Mapping: Once sensitive data are located and classified, mapping techniques can prevent unauthorized access to them. “You can map that entire chorology, or encrypt it with pervasive encryption, or completely map it to an entire domain altogether,” Mustafa said.