-->

NEW EVENT: KM & AI Summit 2025, March 17 - 19 in beautiful Scottsdale, Arizona. Register Now! 

How regulatory pressure is reshaping big data as we know it

Article Featured Image

Automated management

Managing big data for classifications and compliance frequently involves automated, as opposed to autonomous, systems. Moving from manual to automated processes necessitates “validating” them, or seeing if they need to be validated, explained Robert Gratz, Hyland technical trainer 3. The monitoring and governance of automated systems (and their models) increases data quality and accuracy. Common automation methods and validation approaches include the following:

Unsupervised learning: According to Viswanathan, certain algorithms for this type of machine learning look for patterns. Such algorithms are instrumental in automating classifications for regulatory compliance and augmenting natural language processing.

Model governance: Although model governance is fairly broad and incorporates numerous approaches to ensuring quality model outputs, one of the most granular is to implement testing in models themselves. Sondra Orozco, Looker senior product manager, described this as an approach that brings “unit tests into your model, to make sure that every change you make to your model continues to produce correct results.”

Rules-based systems: The addition of rules and rules-based systems controls some machine learning automation and descends from the multi-step reasoning, knowledge base AI side “that tries to assimilate human behavior by looking at the internal steps: the steps people take when they solve complicated tasks,” Aasman explained. Rules can assist with supervised learning for categorization purposes, and are excellent for compliance because “some things have to be absolutes; they cannot be probabilistic,” Viswanatham said.

Human-in-the-loop: The entire notion of automation validation and monitoring of automated processes revolves around the human-in-the-loop tenet, in which people supervise various cognitive computing processes for quality assurance. “Anytime there’s a space where accountability would come into play, that’s absolutely the place where a human would have to come in,” Moore posited.

Enterprise risk

Aside from regulations, big data risks include a seemingly interminable array of factors related to cybersecurity, e-discovery, data loss, underwriting, corruption, inept users, failure, chain of custody, and more. According to Chepurnov, that risk is connected to the technology as well as to people who may “assume that IT and the security department are taking care of information security,” and therefore shirk their responsibility for enterprise risk mitigation. The most arduous aspect of managing risk for big data is understanding—and documenting workflows that identify—“where does this data go, and how is it controlled and what vendors might be touching it,” ventured Ryan Gurney, Looker chief security officer. Breach and cybersecurity risk can be substantially decreased via the layered approach of cloud-based paradigms (widely deployed for backups, too), especially when supplemented with encryption, tokenization, masking, anonymization, or pseudonymization.

Toeing the line

The big data landscape is being inexorably molded by regulations, risk management, privacy, and a need for greater transparency, which are prevalent partially due to statistical AI zeal and what some consider an inordinate valuation of correlation. “Correlation is not causation,” Aasman cautioned. “There has to be a new trend to figure out why something happened, which is explainability.” As big data influences society more than ever, postmodern restrictions raise poignant questions. There are benefits because people have a lot of data and can make better decisions and bring better services to people, but there are also lines that should not be crossed, Bien reflected.

Escalating regulations and legislation are delineating them now.

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues