-->

NEW EVENT: KM & AI Summit 2025, March 17 - 19 in beautiful Scottsdale, Arizona. Register Now! 

IG Nightmare: Macro Trends, Outdated Technology or Both?

Corporations are abuzz over information governance (IG) because it conceptually represents painful macro trends that are hitting them:

  • Explosive growth of all types of data (a.k.a. big data, the other hot buzz concept); and
  • Intensifying focus on governance.

The data challenges are volume, velocity, variety and complexity (I believe Gartner coined this good definition of big data). For governance, it's the unrelenting pressure to ensure that the right policies are in place to reduce exposure to legal and/or regulatory actions and cost.

Each trend by itself generates difficult issues. Combined, they're a one-two punch forcing every company to figure out how to prepare itself before a lawsuit punch hits.

The key underlying issue is proper, repeatable categorization of all information in the company so the appropriate application can act at every point in the information lifecycle, including governance. Five or six years ago, many companies thought the taxonomy classification techniques they used for records and content management, search systems, "smart" storage, etc., could also address information compliance. If the technology invested in a decade ago could do the job and scale, there would be no nightmare today, or at least, less of one. But what really happened?

The taxonomy solutions did not scale, they are hard to maintain, and they need to be created for each language the company uses. Perhaps even more importantly, they can't understand concepts—"this is like that"—as humans do. Taxonomy solutions find matches for keywords and rules, but they can't find the conceptual "more like this" similarities that are so crucial to effective IG.

Look to Risk Exposure for the Solution

The most regulated segment of IG—and the one with the harshest penalties for missteps—is e-discovery. It's a model case study of how some companies are effectively minimizing their overall governance risks.

In e-discovery, a consistent, defensible process is half the battle. Even so, and as you would want them to be, legal teams are rightfully paranoid about inadvertently missing key documents as volumes become extreme. Completely manual review is too costly and time consuming. Traditional keyword search mitigates the challenge, but the completeness of discovery depends on the quality of the keyword index. This leaves nagging doubts about the potential exposure of omitted relevant documents, especially since legal teams may not know all the related terms to search for.

The answer came from another "paranoid" group acutely concerned with overlooking key pieces of information: the intelligence community. They also faced extreme volumes of data growing at great velocity. The fundamental issue was the same: Can we have a system that can do consistent conceptual categorization versus using pre-defined word lists that may not include the terms the bad guys are using today?

The answer was text analytics using advanced conceptual search and classification technology. For years now, the intelligence community has been using this to help catch the bad guys. A few insightful companies have added the same technology to their review platforms, and today, text analytics is the fastest growing trend in e-discovery.

What Do You Need to Consider?

Will this approach work inside the variety of applications tasked with categorizing the company's information and then acting on the classification? The answer depends on:

  • How easy is it to create and maintain the categories? Do we need an expert to define rules, dictionaries, associated word lists, etc., or can we just create a folder of example documents for the software to learn from?
  • Can the solution also categorize documents that are conceptually similar to those already in the categories without having the specific terms defined?
  • Can one solution be used for multiple languages and with the same precision so we can use and enforce consistent policies in every country?
  • Is the information used to define the categories secure? Who can access the words, rules and dictionaries that may contain codenames and other terms or concepts we do not wish exposed?
  • Can the categorizing information be moved securely around the enterprise so a consistent categorization method is used?
  • Is the underlying technology proven to be defensible in other governance and/or regulatory processes?
  • Can it work with the big data volume, velocity, variety and complexity hitting us now and going forward?

Each application in the IG process will have different requirements and actions, but these basic questions need to be addressed.

How to Weed Out (Only) the Junk

Consider an example: the popular idea of moving the existing archive (tape or other media) to the cloud or more cost-effective storage. Many companies are holding off doing this until they have a solid plan, including not moving junk—information that you no longer see a need for. If you reduce archived data by 20%, that's 20% less storage, less to govern, less subject to potential e-discovery review.

The problem is the time and effort needed to create accurate, detailed taxonomies to define "junk." We all know good examples—sports discussion emails, holiday/vacation emails, older marketing materials... but creating rules and word lists for all the different types could be exhausting. Consider email from someone to their manager stating they are not taking the holiday because they are working on project X and attached are diagrams for review. You most likely would want such emails tagged as both "corporate IP" and junk, and have a person decide which it is. If you've foreseen all the right keywords and rules, a taxonomy solution could do this.

In contrast, content analytics technology uses mathematical algorithms to learn the conceptual meanings and correlations among words in a document collection or stream. So, it can accurately identify target documents, even those that taxonomy solutions would miss because they don't include the exact keywords or match the rules. And because the technology is mathematical, it's language-agnostic and works across multinational business without requiring multiple versions.

Analytics technology already is at the core of applications in a number of segments, including e-discovery, social media monitoring, intelligence community research and IG. It delivers scalable, language-agnostic, consistent, secure and highly precise conceptual categorization. This enables individual applications to consistently act (even to flag something for human review) in accordance with specific regulatory and compliance policies so the company can stay out of court and save money.

Will this eliminate your IG nightmare? Not by itself. But you will sleep better knowing that a major aspect of IG is covered better than ever.  

KMWorld Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues