KM leverages data mesh
Anzo is a scalable knowledge graph platform from Cambridge Semantics that is used for data integration and analytics. It features a horizontally scalable graph database called AnzoGraph DB. One of the challenges for graph databases has been managing the very large number of triples and connections and conducting analyses on them that emerge from large datasets. “In 2017, we introduced massively parallel processing components that allowed horizontal scaling,” said Sam Chance, principal consultant at Cambridge Semantics. “The result was a system that was able to compute a series of benchmark queries of just over one trillion triples in 2 hours, compared to an alternate system that took over 200 hours to perform the same operations.”
The ability of AnzoGraph DB to integrate across domains provides the benefits of the distributed environment of the data mesh while still being able to obtain and analyze data from other domains in a federated way. Within or across domains, subsets of data can be packaged and managed in the system as bundled products and included in a data catalog or “graph mart.”
In essence, the data product is a prepared, analytics-ready knowledge graph component. “However,” Chance pointed out, “users often have unanticipated questions, and requirements change, so the data product solution needs to answer ad hoc queries on demand and modify the data products easily.” AnzoGraph answers arbitrary queries, and RDF/ OWL knowledge graphs readily adapt to change—such as adding new data sources—which allows more agility in analyses.
Adopting a different mindset
Looking at packaged combinations of data as a data product requires a different mindset as compared to the traditional view of data as a raw material. “The data products should embody service-level agreements that are comparable to those that ensure the safety of a car that a manufacturer is going to sell,” continued Chance. “In other words, the data product should be reliable and accurate. But this is a tall order for a subject matter expert. Anzo abstracts the technical complexity so that subject matter experts can create data products.” The goal is to allow greater democratization in the use of data, so that citizen developers can be more proactive and empowered.
Among the benefits of data mesh come architectural complexity and its associated costs. Because data in a distributed environment is stored and analyzed locally, each domain needs its own infrastructure. Implementing data mesh enterprise-wide would not be the best fit for most organizations. But there is considerable interest in this approach. In a study by PwC of German companies with more than 1,000 employees, 60% reported that they already use data mesh. A survey reported by Dataversity indicates that around 40% of companies plan to implement data mesh in 2023.
It is likely that most organizations using data mesh will have hybrid environments in which the departments that require greater agility will implement data mesh and others will remain centralized. The key is to select the architecture best suited to each department and organizational function.