Getting Started with Knowledge Graphs and Machine Learning: Part 1 Q&A with Sebastian Schmidt, CEO of metaphacts
So, what the research community did was, it defined something called the F.A.I.R. data principles, which should enable this knowledge democratization. By applying these principles, data becomes findable, accessible, interoperable, and reusable—that’s what the acronym F.A.I.R. stands for. And you do that across institutions, departments, across domains, and, in many cases, really, covering everyone you're interacting with. Enterprises are now also trying to adopt these F.A.I.R. data principles and knowledge graphs have proven to be an ideal technology stack for enabling this.
JW: How do the F.A.I.R. principles map to knowledge graphs?
SS: This is a good question because this connection between F.A.I.R. and knowledge graphs might not seem immediately obvious. The F.AI.R. data principles describe a total of 15 rules or guidelines which have been established to enable companies to make data reusable and human- and machine-actionable. By getting it to that level of a machine actually being able to understand what the data means and interact with it without having a human provide additional context, that's really achieving the maximum goal of really F.A.I.R., reusable data.
The issue with the F.A.I.R. principles is that they completely lack any detail on how to implement them. They really are just guidelines on what you should be achieving. At metaphacts, we have mapped these guidelines to our technology stack and have seen that a knowledge graph is the ideal technology for managing all data assets from the data sources to the data models, and all the metadata required to achieve F.A.I.R data. A knowledge graph is a graph representation of your data, where each entity or resource is stored with its own unique ID, making it obviously searchable and findable, making it uniquely identifiable while, at the same time, enabling the integration of heterogeneous internal and external data sources by interlinking all of those entities / resources. The graph representation also allows for an easy extension of the data and the underlying schema. In a knowledge graph the semantic description of your data is stored in a semantic model alongside your actual instance data, so we are building this common understanding of what the data represented actually means.
This is what I was referring to when I was saying that we are making data “machine-interpretable,” and this is a key step and a unique feature of the knowledge graph where the semantic model describes the meaning of your data. This is the knowledge that right now is most likely only available in the heads of your brightest domain experts, and what every enterprise is struggling to make available to other users internally. And then, as part of the knowledge graph, we can also include metadata like provenance and lineage information on where does the data actually come from—that helps us build trust in the data and in the knowledge that we retrieve from the knowledge graph.
JW: Why aren’t more organizations taking advantage of this already? What is the challenge?
SS: The challenge has been a lack of good tooling, especially tooling to enable the domain experts and business users, which as I said are often non-IT people, to participate in this semantic knowledge modeling and knowledge generation process. And, with most of this important domain knowledge available only to those domain experts, this has often been a broken process. So, what we need is the right tooling but also a structured approach to getting this implemented. A lot of the attempts I have seen in the past failed due to the complexity of really changing internal processes of how data is stored, documented, and shared.
What I think is the best way forward and what we've seen a lot of customers successfully implement is breaking this down into smaller steps and using an agile approach. We would usually start from just one or two specific data sources, the specific end-user information need, or just a limited group of people in the enterprise where this gets implemented and then grow it from there. And our experience has shown that just a four-step approach with those continuous, probably, two- or maybe four-week iterations has been most effective. And the best thing is, we can show first benefits within just a month.
For part 2 of this KMWorld Drill Down Video Interview with Sebastian Schmidt, CEO of metaphacts, in which he explains the company’s four-step process for achieving knowledge democratization, go here.