Technology to connect people and knowledge
Aasman described a construction use case in which different departments, such as the engineering team, along with legal, sales, and marketing units, might all have data—with respective data models and taxonomies—in a knowledge graph. An LLM’s capabilities for understanding natural language would enable it to understand this metadata and provide the means of translating it from one department to another for natural language question-answering. Users can ask questions according to their department’ s business terminology and definitions, so that legal teams can find out if deadlines will be met by the engineering unit, for example. Conversely, engineering can ask the knowledge graph questions to determine legal consequences if a due date is missed.
“People from different domains, that don’t know what the data in the other group looks like, can still ask questions, and the LLM will understand what you’re trying to ask and translate it into something that can be queried from the database,” Aasman specified.
Data mesh architecture
The data mesh architecture is another popular distributed paradigm that rapidly connects people to knowledge. One such implementation involves individual business units describing data products in their repository of choice and then making them accessible through a unified meta-data layer with rich explanations about those repositories and their content. Coupling such a metadata store with embeddings (stored in a vector database) of the actual content within an organization’s distributed sources allows for ad-hoc question-answering.
Some vendors have implemented this infrastructure to “enable users to query not only a single document, but an entire repository or set of repositories,” Nivala mentioned. “You can ask questions that you can answer based on the accumulated knowledge of the organization.” Via this paradigm, storing embeddings of the content of each document in a vector database allows organizations to supplement their questions, called prompts, for language models with relevant information from their own data via the retrieval-augmented generation (RAG) paradigm.
However, the carefully culled meta-data from the metadata layer underlying such a data mesh is equally valuable for producing accurate responses. Thus, the embedded vectors are “coupled with the business metadata to make answers that are more relevant from the business point of view,” Nivala commented. Graph RAG, which is supported by physical data fabrics embodied in a knowledge graph, epitomizes the value of metadata filtering for searching through content to find relevant responses for question-answering. With this method, the entirety of the knowledge graph can serve as metadata, supporting extremely nuanced responses from models for spontaneous questions. “If you only do RAG, I’ve found that just popping 100,000 texts into a vector store gives bad results because you miss all the metadata per element,” Aasman said. With Graph RAG, language models can translate natural language questions into a query language such as SPARQL to support natural language question-answering of those with no knowledge of that underlying programming language.