Text Analytics for Enterprise Search
The Essential Components for High Performance Systems
At TREC 2007, many universities also gave presentations in which specific advanced search techniques offered better results; in some cases, 40% of the relevant documents were found. All of these new techniques used derivation and searched on additional (automatically) extracted meta-information from advanced text analytics engines. Individually, none of them provided a "perfect" solution, but a combination of these techniques could result in an 80% success rate for finding all relevant documents. Clearly, the critical factor in advanced contextual searching is the meta-information derived from automatic text analytics.
Of course, data enrichment cannot be performed manually, as was the case 20 years ago, and still provide any kind of viable solution. Rather, new techniques in statistical and linguistic analytics have given modern investigators whole new arsenals of search tools. Text analytics, text mining and autocoding techniques enable the following functions to be automated and integrated into high-performance search capabilities:
- Document property extraction;
- File property extraction;
- Concept extraction;
- Semantic expansion;
- Automatic summaries;
- Machine translation;
- Entity extraction;
- Fact extraction;
- Exact and near-duplicate detection; and
- Related document groups.
These functions are impressively accurate, and with basic computer equipment users can easily process up to 300Mb of data per hour (about 150,000 text pages). After documents are tagged and organized, advanced interactive search techniques can be used to perform activities such as:
- Automatic generation of table of contents and folder structures;
- Search folders;
- Semantic relevance ranking;
- Clustering;
- Visualization;
- Reporting;
- Auditing; and
- Automatic generation of a chain of custody.
With these new search methods in place, structured, enterprise solutions can be delivered in which text analytics tools tie together all of the appropriate functionality into the right context. Benefits derived from these applications include:
- Structure added to unstructured data, a critical component for business intelligence, legal, law enforcement, publishing and many back-office environments;
- Advanced search capabilities, including additional forms of relevance ranking such as semantic orderings and clustering;
- De-duplication (exact and near) in legal applications;
- Detection of privileged, responsive and hot documents;
- Structured productions and disclosures;
- Content-based archiving; and
- Content-based workflow and document routing in back-office and customer service applications.
ZyLAB’s Text Analytics and Enterprise Search
A great example of an application that puts into practice all of the technologies mentioned in this paper is ZyLAB’s ZyIMAGE Analytics Server. This tool addresses the need for a comprehensive solution that provides a combination of both basic unstructured search and sophisticated structured capabilities for browsing, navigation, searching, visualization and reporting techniques for large data-collection analysis. Specifically, structure can automatically be added as a background job to unstructured collections without having to manually read or review documents, which enhances overall efficiency, optimizes performance and saves tremendous amounts of money and time.
The ZyIMAGE Analytics Server is a component of the ZyIMAGE Information Access Platform (IAP), an award-winning integrated document, content and records management solution that enables professionals across a variety of market segments to capture, investigate, structure and disclose information in an efficient and secure manner.
ZyIMAGE IAP also offers these users specific capabilities for process functionality, relevancy modeling and flexible content analytics. All of these features are supported by ZyIMAGE’s robust search capabilities and XML-based archiving framework that, together, operate as a solid foundation upon which to carry out a number of specific applications:
- E-discovery and e-disclosure;
- Corporate compliance and contract management;
- Case management and litigation support;
- Datarooms;
- Back-office records management for organizations facing legal risk, such as construction, outsourcing, customer service, medical, or HRM environments;
- Federal and local government records management; and
- Historical files.
ZyIMAGE IAP is optimized for these applications due to a unique combination of search technology, security and business-focused content-management functionality. ZyLAB can quickly deploy even the most complex installations of specialist solutions and provide all the necessary training, documentation, support and maintenance. Unique yet affordable text analytics are available that support more than 200 languages and can be easily implemented on a scaleable level, an important approach for specific applications such as e-discovery.
In short, ZyIMAGE IAP provides:
- A complete solution to capture, find, analyze, structure and distribute data;
- The ability to perform large, time-critical investigations with fewer resources;
- Support for efficient and consistent reviews processes;
- A realistic, nuanced approach to finding specific information: ZyIMAGE finds what other systems cannot without getting bogged down in irrelevant information;
- A framework enabling controlled information sharing; optional redaction capabilities;
- A fine-tuned system that fully supports data protection and privacy regulations;
- Open, secure, long-term XML-based archiving of records;
- Extensive auditing and reporting options;
- Full integration capabilities with Outlook and SharePoint; and
- Efficient, knowledgeable professional service.
With ZyIMAGE IAP you can bring knowledge management in house, take the mystery out of e-discovery and bring order to your records management initiatives.
The combination of cutting-edge search and text mining technology for paper, email and electronic files, in combination with content management technology such as e-discovery and e-disclosure management, redaction, workflow, federation and compliant records management has continually positioned ZyLAB as a leader for these types of applications. Effectively focusing on niches has enabled ZyLAB to offer its 7,500 installations worldwide a cost-effective, long-term solution for their search requirements that is fully embedded in their daily business processes and easy to deploy and to maintain.
Additional information can be found at www.zylab.com