Avoiding Legal Pitfalls Through Savvy Data Governance
Visual approaches that rely on dashboards, graphics, and data mapping, in addition to reporting tools, are helpful for contextualizing information about data access. Subsequently, IT teams and data governance personnel can conduct audits to gauge adherence to rules—and even devise ways to improve controls or security procedures. By centralizing governance as a means of accounting for the growing distributed nature of the contemporary data landscape, vendors in this space provide a crucial advantage to organizations across verticals. “There is still a need for incorporating manually authored rules in the context of data governance,” Kamien said. “Plus, you get transparency and accountability with them, and you need to be able to audit.”
Governing Language Models
The adoption of language models for applications of RAG and other forms of generative AI hinges on the ability to govern the information used in training the models. There are numerous ways governance solutions can enable this. In some instances, self-governing language models monitor the traffic going to and emitting from them. According to Kamien, some vendors allow organizations to “simply upload a plain text English document describing your policies and they will turn that into enforcement rules that their system will check.”
Other solutions integrate with frameworks for accessing language models and apply controls to users’ prompts and the model’s output in real time to deny requests or hide sensitive information. Vogt described an architecture in which governance solutions are embedded in indices for vector databases utilized for RAG applications. “If it’s emails, we can classify the email dataset for the RAG as these chunks being sensitive or regulatory concerned,” Vogt explained. “Then, based on people’s attributes, whether or not they have access to public data, non-public data, regulatory data, or something else, we’ve categorized the row in the index and the response would get filtered based on the policy.”
At the Forefront
Data governance will likely remain at the forefront of enterprise initiatives and practices for accessing, modifying, and sharing knowledge within and between organizations. Its numerous capabilities are designed to allow organizations to engage in these practices safely, compliantly, and, most of all, legally to maximize the gains yielded from collected, curated enterprise knowledge.