Intelligent data capture: a trend only beginning Part 1
Intelligent data capture (IDC), also sometimes referred to as intelligent document recognition or IDR (see related article by Harvey Spencer, page 8, KMWorld June 2006, Vol 15 #6 and in this online issue), is the ability to scan documents or electronic pages that have no fixed layout and extract data from specific fields to populate a database or business system. Data must be pumped into major financial applications like Oracle E-Business Suites or my SAP Business Suite, and the sooner the data is entered, the sooner users at a company can begin the financial management process.
'Using IDC, documents may be unstructured, with varying layouts, or "semi-structured" with some fixed fields but mostly varying formats. Examples of unstructured forms are contracts, resumes and letters; semi-structured documents may be explanation of benefits (EOB) forms, insurance claim documents, invoices and the like.
The bread-and-butter application for IDC is the processing of invoices for payment in the accounts payable process. Invoices may be hundreds of pages long and have thousands of line items. It is critical to process invoices expeditiously so that proper and maximum discounts can be taken for payment within a limited timeframe. For instance, an invoice may allow a discount of 5 percent if paid within 10 days, but the full amount is due within 30 days. When companies process thousands--or even millions--of invoices, the annual savings for early payment can be millions of dollars. Why not just pay every invoice immediately and gain the discount? Because holding on to a company's cash for as long as possible improves cash flow and allows a company to invest it in short-term financial instruments, further improving profitability.
But the biggest savings across the board from using intelligent data capture technologies is from radically reduced labor costs and compressed business process time frames, due to the near-elimination of manual data entry.
The IDC marketplace has its roots in basic scanning and document capture, pioneered by firms like DICOM Group's subsidiary Kofax, EMC's Captiva and ReadSoft. Other firms such as ABBYY Software House, Brainware formerly SER Solutions) and Xerox partner Document Strategies are also active in the intelligent data capture marketplace, which is dynamic and robust.
Acquisitions and transformations characterize the major players competing in this market. To get from basic document capture to IDR, Kofax acquired Neuroscript and LCI in 2005 and 2006, and Captiva acquired French-based SWT in May 2005 and by year-end was acquired itself by storage giant EMC . Also last year, ReadSoft acquired a 50 percent stake in Danish firm Consit Development ApS, to give ReadSoft Oracle expertise and an entrée into Oracle's E-Business Suite financial system.
The larger firms in the IDC market space have only reached around $10 million in annual revenues, so it's not a huge market in and of itself.
"This is a technology that belongs embedded as a part of other business systems," says Brainware CEO Carl Mergele, "and we intend to continue pursuing partnerships to that end."
ReadSoft U.S. President Bob Fresneda heartily agrees, "Alliances with major business application providers are a key part of our strategy. That's why we have gained certified status for Oracle's E-Business Suite and SAP R/3 ... We can directly populate their payables applications and use Oracle and SAP native workflow routines. No one else can say that." ReadSoft claims to be the U.S. leader in payables processing, with approximately 250 customers using ReadSoft for INVOICES alone.
But the competition in the market is intense, and how it plays out is not clear at this stage. ABBYY is a research-intensive organization focused on selling its technologies to VARs and integrators, with particular strength in Europe. Brainware has been awarded patents in the United States and Europe, which it hopes to parlay into licensing opportunities with other firms, perhaps even quasi-competitors. Document Strategies is leveraging a close relationship with Xerox (both based in Rochester, N.Y.), and a new Xerox nationwide services offering will be based on the Document Strategies technology set. EMC Captiva is preaching information life cycle management, leveraging EMC's Documentum unit for enterprise content management (ECM) and EMC's core storage management capabilities. Kofax has probably the largest distribution channel network, and ReadSoft is leveraging its Oracle and SAP relationships and aggressive marketing to penetrate the installed base of those software giants. Now for a review of some of the key players in the market:
ABBYY centers its business on document recognition, data capture and linguistic technologies. It develops and sells artificial intelligence (AI) applications, and, in particular, document recognition and natural language processing applications that help people overcome language barriers in a world that is globalizing at an ever-increasing pace. ABBYY's FlexiCapture Studio is an additional tool that extends capabilities of ABBYY FormReader data capture applications and FineReader Engine SDK, helping to extract data from semi-structured forms and documents.
FlexiCapture Studio allows creating a formalized description, called FlexiLayout, which tells ABBYY FormReader or the FineReader Engine how to look for required fields on documents with similar data but different layouts. It's intended for developers, VARs and integrators, and provides easy-to-use, front-end access to the development know-how previously used by ABBYY in-house developers in large-scale forms processing projects. ABBYY FlexiCapture Studio allows the developer to create a FlexiLayout for extracting data from both simple and complex documents on which the location of similar fields may vary greatly.