The smart thing to do: practical applications of monetizing big data in finance
In financial services, the dangers associated with monetizing big data are nearly as great as the rewards. The promises of machine learning, data science and Hadoop are tempered by the realities of regulatory penalties, operational efficiency and profit margins that must quickly justify any such expenditure.
Additionally, there is often another layer of complexity best summarized by Carl Reed, who spent more than 20 years overseeing data-driven applications at Goldman Sachs and Credit Suisse. According to Reed, who currently serves as an adviser to the Enterprise Data Management Council, PwC and Cambridge Semantics, the sheer size of the customer base and infrastructure of larger organizations compounds the issues. He says, “Credit Suisse [had] 45,000 databases; Goldman Sachs [had] 90,000 databases. If you let the financial industry continue to implement technology vertically versus horizontally—because you’ve got no C-suite saying data isn’t a by-product of processes, it’s an asset that needs to be invested in and governed accordingly—you’ll end up with today 10s, tomorrow 100s, and in time 1000s of instances of Hadoop all over your organization.”
The most acute lesson learned from Reed’s tenure within financial services is to avoid such issues with a singular investment in data architecture that pays for itself over a number of applications across the enterprise. In doing so, organizations can justify data expenditure with a staggering number of use cases predicated on variations of the same investment.
Link analysis
The crux of Reed’s approach was to focus on semantic graph technology that linked the numerous silos existent across the organizations for which he worked. “One of the first problems we addressed at Credit Suisse was to harmonize its different infrastructure,” Reed says. “We had Cloudera, we had Hortonworks, we had open source, we had open source Hadoop.” Such infrastructure is readily connected on a semantic graph with standardized models and taxonomies. The graph-based approach pinpoints the relationships between the nodes, empowering the enterprise with several use cases of what is essentially the same dataset. The most eminent is arguably an improved means of determining data lineage for regulatory compliance, which could be the biggest challenge financial entities face after last decade’s fiscal crisis.
“The new world of big data and the evolving world of regulatory and operational reducing margins is about having the data associated with our business at the main sector being first-class citizens,” Reed said. “Nothing’s going to change that. But the relationships between them are now first-class citizens too.”
The graph approach to managing relationships proved equally valuable for improving operations and creating business opportunities. By determining how even seemingly unrelated nodes can contribute to a certain business problem, organizations can transcend regulatory compliance and further enterprise objectives. “For the type of causal reasoning you need to do for this style of link analysis—whether you’re understanding client social circles, how a market is behaving, how a potential change in your environment is going to have positive or negative ramifications, how to triage something that’s gone wrong—it’s all about the linkage between objects,” Reed observes.
Employee entity relationship graphs
Such linkage offers additional utility for monitoring the enterprise and its employees for multiple use cases. For instance, the underlying semantic graph approach is ideal for insider trading surveillance, a task predicated on illustrating the relationships between people who may have knowledge of a trade or business development. “If I’ve got a person who’s an insider and a person who’s a trader, how do I link those two people together to understand whether there has been a chance that the wrong information’s traveled from one to another? I have to start thinking about people relationships,” Reed explains. The ensuing conceptual modeling can contain appropriate organizational structure, electronic communication, employee information (including geographic location and scheduling) and other aspects of the people modeled.
By contextualizing the information with temporal data that indicates points in time people could have exchanged information, that form of link analysis can demonstrate the likelihood of insider trading. Moreover, it’s based on the same graph framework used to demonstrate regulatory compliance—and can involve some of the same data. After creating an exhaustive model for insider trading surveillance centered on those working for a company, “all of a sudden, you’ve got an employee entity relationship graph,” Reed says. “Now you can say these are the traders that are exhibiting potential insider trader activity. These are the people who’ve got the information that, if I can show a path to those traders, something needs to be investigated.” According to Reed, the same graph-based link analysis is used by the intelligence community to determine the movement of money among terrorist organizations.
The true merit of link analysis graphs is that they are reusable for additional organizational functions. Employee entity relationship graphs are useful for more than just monitoring insider trading. “The graph I talked about for insider trading, we built that at Goldman,” Reed adds. “We actually had the security division come to us because they knew we had people relationships.” Internal enterprise knowledge graphs are extremely similar to the employee entity relationship graph described by Reed, but also include knowledge, skills and experiences alongside relationships between parties. The graphs can strengthen security, monitor insider trading and determine which employees are most appropriate for new tasks or client interactions. Their visual representation of which workers have relationships, knowledge and experience relevant to strategic objectives is influential in selecting the best candidate for a project.
Reed says, “Whenever I had a new client or strategy that I wanted to present to an existing client and I went to my salespeople, a bunch of hands went up and they all said ‘me’ because they wanted the revenue recognition. I can use the [employee relationship] graph to understand, before I go into that conference room, who’s had the most contact with the client and who’s the strongest candidate based on records of calls, conversations and interactions with the client to disambiguate the hands in the room.”
Operational change management
There are also several impact analysis use cases predicated on an initial investment in semantic graph technology. Impact analysis is critical to optimizing operational efficiency—particularly when change management is involved. By modeling all the different functions, infrastructure, applications and departments involved in a datacenter in a relationship graph, organizations can understand the effect of each of those objects to streamline operations. Such a graph becomes vital to the proper implementation of change management, which should be done as painlessly and quickly as possible to improve workflows. Otherwise, companies face reactionary situations in which there is no definite knowledge of a proposed change’s impact, which simply leads to delays in which, “no one is sure enough to say ‘no’ or ‘yes’,” Reed explains. “So you thought the decision was no and you build up more and more technical debt as things fall more and more behind in terms of patch levels and homogeneity in your dataset, to the point where you were forced to do something. When you did that something, it was macro and you needed a small army of people to negate the risk.”