E-discovery powers up legal processes
“Out of a million documents, only a few hundred might be used in depositions,” says Mark Noel, managing director of professional services at Catalyst, which provides e-discovery software and services. “A subset of 50 might then be used for motion practice and only 10 at trial. It is a very steep funnel, and it’s important to end up with the right 10 documents.”
Catalyst Insight is a secure cloud-based platform where clients can search, review, mark and produce documents. It can be augmented with Insight Predict, a predictive ranking TAR 2.0 solution that uses continuous active learning (CAL) to speed the review process by allowing technology to work alongside the judgments that human reviewers make. The solution brings the most relevant documents to the top of the list rather than working in a linear fashion.
The company’s TAR 2.0 software is specially designed for e-discovery. “Some of the early TAR products were repurposed machine learning that dated back 20 years,” Noel says. “They can work in situations where the target documents are a large proportion of the total, but if you are looking for the one percent that are ‘hot docs,’ then they are not as effective.” With TAR 2.0, attorneys and legal professionals who are subject matter experts do the initial coding for relevancy. Each of their judgments about the relevancy of a document is fed back to the system as a means of “training” to identify others that also might be relevant.
The best of both worlds
In the case of earlier versions of TAR, adding new documents caused the random sampling assumptions to no longer be correct. “Unlike earlier products, which had a finite learning phase and then a production phase, TAR 2.0 allows new coding to be immediately incorporated into the algorithm for searching the document repository so that it is correctly tuned to the current problem domain,” explains Noel. “We want every decision made by an attorney to be put to maximum use, allowing humans to do what they do best, and then let the computer do what it does best, which is to quickly surface the relevant documents.”
One practical limitation of early versions of TAR was that it could not handle small volumes of documents because the usual percentage of samples did not provide enough examples from which the computer could learn. “In one case, we needed to do a discovery on 16,000 documents in just a week. Once a few documents had been loaded and some decisions made on them, 96 percent of relevant documents were located by just two attorneys in four days,” Noel says.
At the other end of the spectrum was a financial institution involved in civil litigation. Even after early case assessment had reduced the number of documents that were potentially responsive, 2 million remained. An initial sample indicated that only one percent were responsive. With the use of continuous active learning in TAR 2.0, the e-discovery process was able to extract 98 percent of relevant documents from 6 percent of the total volume because CAL’s predictive coding pushed that group of documents to the top.
“In the future, more sophisticated technology will allow such actions as the reuse of attorney judgments, checking for outliers and monitoring the repository for problems in advance,” says Noel. “This kind of proactive strategy will help companies reduce their risk exposure and speed up e-discovery.”