Text analytics and beyond
The term “text analytics” encompasses a broad and heterogeneous group of technologies that can add metadata to unstructured content; identify components such as people, places and events; and convert information to structured form so it can be analyzed by business intelligence (BI) solutions. The technology may employ statistical, linguistic and machine learning approaches to extract meaningful information. It can be used in a wide range of business purposes, from fraud detection to sentiment analysis. The push is increasingly toward more sophisticated interpretation of unstructured content that goes beyond what is currently considered text analytics.
According to Forrester, more than 200 companies are providing text mining or text analytics products, so it is a crowded market. The participating software products offer a variety of approaches to extracting actionable information from content that is generally recognized as accounting for about 80 percent of enterprise content. Those software solutions are becoming more intelligent. Rather than focusing on keyword searches or statistical analyses alone, they are incorporating a deeper understanding of language through greater semantic analysis and machine learning. That trend is moving text analytics well past the traditional approaches into the realm of cognitive computing.
Nasdaq
Nasdaq is best known for its technology-oriented stock exchange, but its business activities are much more extensive. The company owns or operates 32 exchanges globally, has a Global Information Services business that delivers market data and other value-added information and a Corporate Solutions group that helps organizations manage the flow of information to and from their audiences. The Market Technology division sells technology used to run Nasdaq’s own markets and other variants to companies that run other exchanges, including those in Singapore, Hong Kong and Turkey. It also sells Nasdaq SMARTS Trade Surveillance, which monitors trading activity for indications of insider trading, market manipulation and other questionable transactions.
Nasdaq was on the lookout for a product to complement SMARTS with text analytics. “We could see a lot of value in adding the ability to monitor e-mail and other text communications for behavioral indications that corresponded to alerts from SMARTS,” says Bill Nosal, VP of business development, market technology, for Nasdaq, “to create efficiency and context in investigations of these alerts and enable better prioritization. We had been exploring machine intelligence for some time as an approach we wanted to take for conducting higher-level linguistic analyses, and our customers were saying this would help them move toward more holistic surveillance.”
During the search, Nosal discovered Synthesys, a cognitive computing solution from Digital Reasoning. The combination of the software’s performance and the backgrounds of key personnel in the company convinced him that it was the right solution to pursue. “Digital Reasoning is exceptional in terms of how it processes language,” Nosal says. “We looked at other natural language processing solutions but they did not go nearly to the depth that we saw from Digital Reasoning.”
SMARTS and Synthesys work in tandem; each product keeps the other informed. “When SMARTS generates an alert that might indicate insider trading, collusion or other concern,” Nosal explains, “information from that alert is passed to Synthesys, which checks for relevant communications and can transmit that information back to the SMARTS user to create important context that can assist the investigation. Synthesys can also trigger an alert that can be sent to SMARTS.” If the business analyst sees an alert on both systems, that transaction should get a closer look. The convergence helps analysts and compliance officers prioritize the issues they need to check out.
In addition to use by market participants like broker dealers and buy-side firms, the combination of a transactional analysis system with linguistic processing is helpful to regulators. “They may seek to monitor not only the traders of their regulated entities but also their own employees,” Nosal says. “Certain types of information are also more easily detected by analysis of communications. For example, in the mortgage meltdown, the dynamics of the transactions were obscured by multiple levels of derivatives, but analysis of communications could have been very revealing.”
Sophisticated analysis
Nasdaq plans to expand its use of Synthesys into other products for its clients and internal uses. “Many of the models used in analyzing communications in financial services, such as the spreading of rumors, are broadly applicable to clients in other industries,” Nosal says, “and we have opportunities across our businesses to deliver value-added products that leverage these capabilities.”
Digital Reasoning first developed its technology for use by the intelligence community. “Our technology can look at a million documents and figure out that a person mentioned in multiple documents is the same individual, even if the person is using aliases,” says Tim Estes, CEO of Digital Reasoning. The company began to get interest from the financial services industry because the technology could be redirected toward compliance.
Estes observed a need for more sophisticated analysis than was available from most text analytics and natural language processing (NLP) products. “Entity extraction NLP is interesting, but some aspects of that technology are now commodities,” Estes says. “More compelling is the ability to do subtle classification, such as determining from chats among traders whether they are colluding or whether expectations are going up or down. This is much more difficult, but more relevant to Wall Street.”
Digital Reasoning is focused on cognitive computing and deep learning, which enables the software to detect complex relationships embedded in large volumes of unstructured text. “We are creating knowledge objects across many documents, allowing computers to perform unsupervised learning,” Estes adds. “The system can discover relationships and then create a knowledge graph that can be analyzed by time, space or other dimension. Users can zoom in to a certain time range, for example, and see how relationships among entities are evolving.”