Emerging content formats challenge e-discovery
As new types of content materialize from various applications and devices, the e-discovery process will become more difficult. According to a recent report from Osterman Research, any electronic information is potentially subject to e-discovery, including text messages, social media posts, data in collaboration tools and data from the Internet of Things (IoT). Web pages and data from wearable devices and vehicle event recorders are among the new sources of electronically stored content (ESI) that are possible targets for e-discovery.
While 80 percent of respondents in the study said they feel prepared to handle e-discovery for recent emails, only 12 percent felt they could do so for online applications such as Slack and only 7 percent for Facebook and Twitter.
A few decades ago, the issue of whether electronic data (as opposed to paper) constituted a record was still under discussion, and requests for e-mails were likely to be met with arguments. But around the year 2000, court cases began relying on metadata from electronic documents such as emails because that could prove when a document was sent. Laws have often lagged rapidly changing technology, but as the volume and velocity of ESI have exploded, keeping them in synch has become even more difficult.
The main drivers for e-discovery are still email and collaboration, according to Nishad Shevde, director of strategic operations at Exterro. While those forms of content do present some challenges, many of the content streams from those applications are now available in discovery-ready form. Even consumer-style collaboration tools such as Slack, which are available to anyone as non-enterprise applications, can be branded as corporate accounts and the resulting messages stored as data. Slack already provides an enterprise service that makes the content available for e-discovery although it is not a mainstream target at this point.
One issue with social content is determining who is the custodian. “When no one is assigned to manage an application and people are jumping in and out of an application so that they are not all in it on any given day, ownership as well as membership can be murky,” says Shevde. Another issue is how to define context. “In a formal memo, everything is very clear—the sender, the recipient and the content of the message, which is typically written in a formal style. With some of the communication tools, the intended recipient is not always clear, and how emojis and embedded pictures should be interpreted can also be ambiguous,” he adds.
Visual imagery presents challenges in both discovery and interpretation. Shevde cites a situation in which three employees were assigned a task and none of them completed it. Their supervisor sent them a video of three people in a baseball game who all let a fly ball drop to the ground. “The message was clear that they had all ‘dropped the ball,’ but in a discovery situation, how would this be found and interpreted?” he asks. As rich media proliferates, those issues may become more than hypothetical.
Exterro supports all stages of e-discovery including identifying relevant custodians, hold, collection, preservation, review and production. In recent years, the company has focused on building out connectors and now has connectors for more than 30 data sources. Those can be network share drives, laptops, desktops, cloud-based repositories and archives dedicated to aggregating communication and social content. “We see our platform as one that can provide visibility from all these sources,” says Shevde. “With our growing set of APIs, we can drive workflow optimization to carry out all stages of e-discovery on nearly any platform.”
Forensically defensible
Catalyst, a software and professional services company, handles complex cases with large volumes of electronic documents. Its Catalyst Insight software is used for e-discovery and Insight Predict for technology-assisted review (TAR). Catalyst Predict uses continuous active learning to support its search for relevant documents. “Studies show that 70 percent of the cost of e-discovery is in the review, so we are trying to expedite that process by using machine learning,” says Robert Berger, VP of marketing for Catalyst. Berger has also noted a rise in the amount of IoT data that is subject to discovery, which is going to put additional pressure on review costs because some of that data requires quite a bit of interpretation.
E-discovery for social media and websites is more difficult than it is for documents and email for a number of reasons. When a law firm needs social media or web information, it can print the page, but that approach has some drawbacks. “Printing a Facebook page or a page on a website may seem like a good way to enter this content into discovery, but an inexpensive software program could modify it, so its authenticity could and should be challenged,” says Steve Linn, a digital forensic analyst who works with Catalyst through its partner Forensic Pursuit.
Some new standards are being developed to use a real-time clock and hash values, for example, to verify that content was on a website at a particular time. “As more law firms recognize the problems inherent in some of these non-traditional formats, there will be more of a push to develop methods for authenticating them,” Linn says.
A second challenge is how to get data from a web page in HTML and put it in a readable and searchable format.“Right now the solution is to use a PDF version, which can be reviewed. However, in the near future as more demand comes about for web content, more products will evolve that can display native HTML using existing tags such as title and date, Linn explains.
The same applies to Facebook. There are several different types of data in Facebook, such as a post, a comment and a picture. “As demand grows for making Facebook content discoverable,” Linn continues, “these data types will be directly translated into e-discovery platforms so that it appears just as it did originally.” The goal is to cut down on review time because of its impact on cost.
Finally, data from the IoT is beginning to be considered as a source of evidence, particularly in criminal cases. Law enforcement has begun to take note of the possibility that Amazon’s Alexa and Fitbit, for example, may have data that helps determine if a crime has been committed. It is not yet clear how that data will be collected, analyzed and used.