The secret of the cloud: Remote collaboration, elasticity, and the e-discovery paradigm
Metadata management
E-discovery’s premier challenge is building the corpus of documents on which to perform advanced analytics. This parallels the challenge of most data science or BI projects. For these endeavors, organizations must cull appropriate data from “the volume of data that people are creating, which is growing at a faster rate than it’s ever grown before,” Shankar commented. The data governance staple of metadata management provides binary utility for e-discovery’s forensic data collection and processing steps. Organizations initially collect data via “the meta set of data before they get down into the subset data,” Jack said. Once relevant data has been scrutinized according to metadata such as date ranges, for example, it is usually compiled in the cloud.
Enriching files with additional metadata is crucial to processing, which Carns individual piece of data.” Typically, the more metadata there is about data assets, the more useful those assets become to downstream applications such as data exploration, search, and data visualization. Moreover, this information is vital to tagging and cataloging data for these purposes.
According to Camara, use cases for enriching files with metadata include the ability to locate similar documents among duplicates, which is helpful for identifying multiple versions of a contract or a regulatory filing. Other examples involve adding metadata about communication that denotes which parties were communicating and important developments such as spikes or unexplained absences in correspondence. Metadata offers critical value in improving the discoverability of actionable items since “the outcome of these massive lawsuits—we’ve had settlements for hundreds of millions of dollars for clients—is predicated on what you find,” Shankar noted.
Data exploration, data visualizations
Once that metadata has been added, organizations extract it along with pertinent data elements to begin the data exploration process. According to Carns, “You first have to identify the component parts in order to apply all of these visualizations, etc.” Data visualizations are critical to the data exploration, enabling organizations to understand and tag data according to any singular use case, such as loading applications or running analytics in the cloud. Often, e-discovery platforms include dynamic, interactive visualizations such as timelines that enable users to see a progression of documents or events detailed within them. For example, a timeline illustrating critical events in the Enron scandal could depict the various fictitious and real entities the former energy powerhouse created, communication between its constituents, and whistleblowers’ actions.
“All of that kind of analysis is basically impossible to do if all you’re doing is running searches or looking at the documents one by one,” Camara said. “You need a visual interface that will surface those relationships to you, and then let you interact with them.” Such functionality is essential to profiting from mergers and acquisitions in which prospective companies send large volumes of documents, Shankar said. “You have to make a determination in a short amount of time whether or not to buy sift through high-volume information rapidly is pretty valuable.”
Data discovery and search
Search is a data discovery tool that can be used to inform downstream analytics for e-discovery or other use cases. Similar to visualizations, the efficacy of search depends on the detail and accuracy of the metadata enrichment and extraction phase since “if you do a bad job of extracting information from your data, you’ll do a bad job of searching it,” Shankar noted. The majority of cloud e-discovery platforms involve natural language search and what Camara described as traditional terms- and connectors- based Boolean search, which enables professionals to use proximity connectors and fuzzy adjustors to specify what they want.
Semantic search is useful for classifying documents into semantically related topics. “Those topics are metadata dimensions that are available in search and analytics,” Camara said. Organizations can, for example, utilize semantic search to locate all the documents collected from a specific witness to learn which topics they discussed. Alternatively, this capability enables organizations to review individual documents, identify important topics in them, and view other documents related to the topic. These same techniques are almost universally applicable for selecting data for data science projects or business analytics.