Enriching text analytics with graph databases
Graph databases have garnered attention for their value in such applications as customer profiling, fraud detection, and recommendation engines. The potential of this market is affirmed by numerous predictions for rapid growth. Gartner predicts that the market for graph analytics and graph database management systems will double every year through 2022. Other estimates are more conservative but still substantial, hovering around a growth rate of 25% per year. In general, the market size is expected to grow from less than $1 billion in 2019 to about $4 billion by 2025.
Why the spectacular growth expectations? Graph databases are ideally suited for storing information about relationships among entities, for accessing diverse types of information, and for easily incorporating new information. These capabilities are a great match for the complex nature of today’s information as well as the fast pace of market changes. They are particularly useful for enriching text to improve precision in search and analytics. Because the relationships are stored as part of the data, rather than being calculated as they would be with a relational database, analyzing them is much easier.
A different model
Getting away from the mental model of a database as a set of rows and columns can be difficult, since it is familiar to think of databases in the traditional relational model. “Visualize instead, a whiteboard with a series of bubbles connected by lines,” said Alessandro Negro, chief scientist at GraphAware. “Each bubble is an entity, such as an individual, and the line connects it to another entity.”
The line or “edge” completes a subject-predicate-object group referred to as a “triple.” An example is “Charles studies medicine.” Every individual in the database who is linked to medical studies then becomes a part of a network of connections. “The storage layer saves the node and relationship, and the database represents the entire set of nodes and relationships,” Negro explained.
Since graph databases do not have a fixed schema or structure in the way that relational databases do, adding a new entity or type of relationship is much easier, similar to drawing another circle on the whiteboard. It is not necessary to add new fields to create a new schema, or write new queries to access the information.
In addition, graph databases can import existing relationships that are codified in ontologies. For example, the Financial Industry Business Ontology (FIBO) defines financial business terms and relationships, and these can become part of the graph database without additional development effort. The ontology is consistent with standards such as the resource description framework (RDF), a set of standards developed by the World Wide Web (W3) Consortium to facilitate data interchange. It can be easily stored in a property graph such as Neo4j, one of the early entrants into the graph database market.
“The biggest advantage of a graph approach is that you can merge different searches into a single source of truth,” Negro commented. “For example, in a typical siloed situation, different systems such as CRM, supply chain, and manufacturing would not be able to exchange data. Through the use of graph databases, however, it is possible to unify the data to detect common customer complaints, find associated issues that relate to a product defect, and then propagate this knowledge back to engineering and design. All these factors can be integrated into a knowledge graph to answer types of questions that were not possible to address before.”
GraphAware develops custom graph data applications, including knowledge graphs built on Neo4j, and also offers an enterprise knowledge graph platform called Hume. Hume incorporates components such as natural language understanding, sentiment analysis, content classification, and cognitive search.
Adapting to the unpredictable
The ability of graph databases to evolve makes them well-suited for unpredictable situations such as biological research, where new hypotheses are being generated or discoveries made, and the information about molecular entities, diseases, and genetics is highly interconnected. “They are more tolerant of changes,” said Atanas Kryakov, CEO of Ontotext, “because the database structure does not need to be revised each time new relationships are added to a graph database.”
Ontotext’s technology is used primarily for data integration and information extraction. GraphDB from Ontotext supports metadata management and integration and is also used in the development of knowledge graphs. For example, it can combine information from the Linked Open Data Cloud as well as curated commercial datasets such as Factiva and Dun & Bradstreet with known relationships among entities in the graph database. “This information can be combined into a big knowledge graph,” stated Kryakov. “We can then use these graphs to do better text analytics. Our text analytics process takes links and names directly from GraphDB, so we can get millions of facts very quickly.”
“Graph databases are particularly good at disambiguation,” explained Kryakov. “The graph database creates a semantic profile or fingerprint, which helps when comparing it to the semantic fingerprint of the candidate meaning. This results in more accurate disambiguation.” Disambiguation is a critical function in language understanding, since so many words have different meanings, and the intended one is dependent on context.
Companies and Suppliers Mentioned