The Worst—and Best—in E-Discovery
Whether You Celebrate or Mourn is Up To You
In some ways, we were way ahead of the curve.
We’ve been talking on these pages about the challenges of locating, retrieving, indexing and presenting electronic information in the event of litigation—also known as "e-discovery"—for several years. Way before the 2007 amendments to the Federal Rules of Civil Procedure made it virtually mandatory to develop a plan for producing ALL types of electronically stored information—emails, Word documents, spreadsheets. Many interviews, articles and entire KMWorld White Papers have been devoted to the subject.
But in many other ways, we’ve failed to adequately examine the issues that make e-discovery immensely complicated and utterly unique as an information management challenge.
I don’t feel too badly, though, because most of the world has been as clueless about e-discovery as I have.
One of my all-star go-to guys in the field of e-discovery is Dr. Johannes Scholtes, president and CEO of ZyLAB. Jan, as he’s known to his friends (and I count myself in that fortunate group), has spent countless hours and stupendous amounts of frequent flier miles traveling the world to address companies’ preparedness—or lack of it—for dealing with the demands on information systems in light of litigation and regulatory compliance.
And Jan is the first to tell you…"lack of it" describes most companies’ grasp of their exposure to risk in the event of a major lawsuit. Many of the juiciest headline-grabbing business scandals can be attributed to the players’ inability to produce documentation—especially the electronic variety—to the satisfaction of judges, juries and opposing counsel.
I asked all my interview subjects this month to focus not-so-much on "best practices" in preparing for an e-discovery motion, but on "worst practices." What are the lessons to be learned from those who have gone before, and tripped over the many "gotchas" along the way?
Jan can boil it all down to one overruling mistake: Misunderstanding (and misusing) the technology for producing all the relevant materials pertinent to discovery requirements.
"People use the completely wrong search technology for discovery search. And if opposing counsel finds out about it, they will fry you in court," he states simply. "There’s a difference between familiar Web and portal search tools, and legal discovery search," Jan explains. "And the difference is between finding ALL the documents, or finding just the most relevant documents." I bolded that because it can’t be overstated how important that seemingly simple statement is.
The familiar Web search tools we know and love are crappy at discovery. (That’s me saying that, but everyone I spoke to agrees on that point.) It’s not because they are bad search tools, it’s just that e-discovery demands a different approach. A typical search, for example, is designed to provide the "top most relevant" responses to your query. It is optimized to look for the most popular, the most linked-to, the most visited, etc. And to streamline performance, public-type search tools determine what that top end is. It might be 10,000, but it’s still only the top end of the "most relevant" list. It stops looking after that.
E-discovery is a different animal altogether. In an e-discovery effort, your search tool isn’t supposed to find the "best" matches; it has to find ALL the matches… that’s ALL, as in every single one.
Not to overuse a concept that has had plenty of exposure in these pages, but it really is the "long tail" phenomenon at work. Web search, and even enterprise search engines and appliances, are optimized to find the "tall" end of the curve…those documents that are determined to be "probably" the best answer to your question.
But there is great risk in relying on "probably" in a legal discovery. Because it is the "long tail" of documents trailing out to the right into near-infinity of the curve that can bite you in the butt. A single email that may not have been found by an enterprise search query can be a determining factor in a legal battle.
And not necessarily because it’s got some "smoking gun" incriminating information in it. IF there is such an email in your possession, and IF you don’t locate it for the discovery process, and IF opposing counsel knows about it…you lose credibility, and your technology can be deemed unsatisfactory.
"If you can’t find all the relevant documents, or your search engine only produces pre-indexed documents, or uses a popularity-ranking algorithm, or cannot produce consistent results when used in different moments in time, opposing counsel and the court will not have much confidence in your discovery production," says Jan.
And you will lose. Because the courts (and opposing counsel) don’t really care how you have organized your content. They want it all, and if you can’t produce it (or, worse, produce different results every time you re-search or re-index), you’re pretty much screwed. The courts these days have little patience for the "I can’t find everything because it’s really hard" defense. In fact, that’s in large part what the new FRCP rules regarding electronically stored information were designed to circumvent. You DON’T get an easy way out. And the courts know it. And your opposing counsel knows it, too.
Under the FRCP rules, the opposing lawyers have to get together ahead of time (it’s called "meet and confer," in the parlance) and agree on which documents are going to be disclosed to each other. During this phase, the lawyers will negotiate and eventually agree on which documents are relevant and which aren’t. It gets nitty-gritty; the lawyers (after much negotiation, I’m sure) agree on a set of words that will be part of the document search. This is a "negotiated Boolean query." (I told you it was nitty-gritty.)
Now, this is where the search tool itself comes into sharp relevance. If the search tool only looks for indexed documents, and doesn’t search for documents that are not indexed, or are password-protected, you might be exposing yourself to big trouble.
For many search tools, it’s not clear what exactly they can find. They may index the contents of a .ZIP file…but they may not. They may work with encrypted files…or maybe not. They may recognize .PDFs that are bitmapped…or maybe not. They may index Unicode characters…or maybe not.
I don’t know about you, but that’s too many "maybes" for me to be comfortable with.
To make matters worse, if you DO return a set of documents that are not considered adequate, you (first) get a fine and then (second) have to do it all over again anyway. And after weeks of delivering set after set of "wrong" responses, you are very likely to be looking at an uncomfortable settlement. "This is what happened in the UBS Warburg case," explains Jan. "They kept filing tapes upon tapes, but couldn’t make the deadlines to produce all the relevant information."