Connected Data » Building knowledge graphs in the real world

As the interest in, and hype around, Knowledge Graphs is growing, there is also a growing need for sharing experience and best practices around them. Let’s talk about definitions, best practices, hype, and reality.

What is a Knowledge Graph? How can I use a Knowledge Graph & how do i start building one? This panel is an opportunity to hear from industry experts using these technologies & approaches to discuss best practices, common pitfalls and where this space is headed next.

Transripts below

If you are more of a visual type, you can also watch the presentation

— Speakers Info —

Katariina Kari

Research Engineer, Zalando Tech-Hub

Katariina Kari (née Nyberg) is a research engineer at the Zalando Tech-Hub in Helsinki. Katariina holds a Master in Science and Master in Music and is specialised in semantic web and guiding the art business to the digital age. At Zalando she is modelling the Fashion Knowledge Graph, a common vocabulary for fashion with which Zalando improves is customer experience. Katariina also consults art institutions to embrace the digital age in their business and see its opportunities.

Panos Alexopoulos

Head of Ontology, Textkernel BV

Panos Alexopoulos has been working at the intersection of data, semantics, language and software for years, and is leading a team at Textkernel developing a large cross-lingual Knowledge Graph for HR and Recruitment. Alexopoulos holds a PhD in Knowledge Engineering and Management from National Technical University of Athens, and has published 60 papers at international conferences, journals and books.

Sebastian Hellman

dbpedia.org

Sebastian is a senior member of the “Agile Knowledge Engineering and Semantic Web” AKSW research center, focusing on semantic technology research – often in combination with other areas such as machine learning, databases, and natural language processing.

Sebastian is head of the “Knowledge Integration and Language Technologies (KILT)” Competence Center at InfAI. He also is the executive director and board member of the non-profit DBpedia Association.

Sebastian is also a contributor to various open-source projects and communities such as DBpedia, NLP2RDF, DL-Learner and OWLG, and has been involved in numerous EU research projects.

Natasa Varitimou

Information Architect, Thomson Reuters

Natasa has been working as a Linked Data architect in banking, life science, consumer goods, oil & gas and EU projects. She believes data will eventually become the strongest asset in any organization, and works with Semantic Web technologies, which she finds great in describing the meaning of data, integrating data and making it interoperable and of high quality.

Natasa combines, links, expands and builds upon vocabularies from various sources to create flexible and lightweight information easily adaptable to different use cases. She queries these models with their data directly with SPARQL, guarantees data quality based on business rules, creates new information and defines services to bring together diverse data from different applications easily and with the semantics of data directly accessible.

—Transcripts —

Hello, everybody. So we just had a fantastic talk talking about one real life Knowledge Graph being used in practice. And so this is a this is a panel about really deploying Knowledge Graph, Knowledge Graph. And we have some great speakers for you. I am actually not going to introduce them in the way we’re going to because you have a nice pamphlet in your thing to tell you the background of each of our speakers. What I’m going to do is ask each of you to present a little bit about your Knowledge Graph in practice of your company.

And so we’ll start with a panel and then moved on.

Yeah, OK, so I work for the Skarner Externalize provider of semantic software for the HLT equipment domain. So we serve the people who and the companies that need to understand people’s profiles, resumes, CVS and also vacances. So practically the supply and demand side of the labor market and we want to merge together. So in that context, the critical aspect of domain knowledge that we need is knowledge about professions and skills and qualifications and how these are related to each other.

So we need to know what professions are out there, how they’re expressed in in in text. That’s very important in different languages, if they mean the same thing or if they are actually something different and how they related to each other. So it’s a very conceptual Knowledge Graph and the technology we use is property graph representing that. And the main applications of it is for semantic parsing. So we pursue a we the entities relations and for search for expansion and semantic matching.

Natasha, hi, I’m Natasha. Very teamwork. I’m an information architect in a definitive which was previously known as Financial and Risk Business of Thomson Reuters. So we are actually a data company and we have a huge amount of data on Knowledge Graph that feeds actually many products, many services. One of these products is permeated that ETL shotput referred to earlier. So this is a Fluree product. We also have another product, which is both framework. My colleague Jeffrey Corelle will talk to you about it in a while, which is an enriched gravid, more enriched than the pyramid, which is a commercial product.

So we have a, as I said, a huge amount of context. So we maintain what we call a metadata repository. Every publisher in our company must register their data sets in that metadata repository. They should put some metadata around it. So provenance, who is the best contact for? But also they can express the semantic meaning of their datasets, the different kinds of distributions. How are we providing them, how these different syntactic formats are represented in the semantic meaning and how so people can from this metadata registry, people, the consumers can actually discover the datasets and can understand how they can use them and they can even track provenance and lineage.

So how is it how is this dataset related to other datasets as well? So this metadata registry is built on AWS its uses Neptun as a triple star. We support Biton plurality in our data. We do it with named Graff’s and named graphs are actually used also to support versioning in our ontologies. And actually I won’t call them ontologies. I will call them vocabulary’s because they are mostly utilizing Chacal because we use these schemas to actually validate also the data as they enter our meta data registry.

So we have 100 percent data quality. Thanks.

Hi, so my name is Katarina Khari and I work, I’m an anthologist, that’s Orlando, and the Knowledge Graph we have at Salento is also really a vocabulary for fashion to drive use cases like search, understanding and to build a slightly more dynamic browsing experiences for our customers. We are also using Amazon Neptun. We’re using named Graff’s actually was interesting to hear using versioning. We haven’t gotten into versioning yet, certainly to explore that, but we used named Graff’s to write implied triples, have them in explicit form so that applications that are built on top of the Knowledge Graph and the ontologies would serve more before mentally testing.

So. Hello. Yeah, I have. So I have two stories. One of the history of Pedia and one of the the current stuff we’re doing. So Deepdale is actually one of the oldest Knowledge Graph there is. So we are the first version was published in two and 12 years ago and it was kind of like Knowledge Graph before the word Knowledge Graph was even pushed by Google. So the main thing we did was we extracted the knowledge from Wikipedia and Holsted as a database and ran free publishing for everybody.

And this data was immensely useful. I think one of the most common use cases is to enrich search. Uh, for example, the relation between Beyoncé and Ivy Park is also in Wikipedia. So it gets extracted. But UPDF so you kind of like get this for for free because you can just download it. And it was so it was immensely useful and kind of like community formed around it that improved the data quality and worked on this for years.

And companies included this into into their, their database. Um, but so after we have a we have a fairly large community and it’s very diverse, like you can imagine that everybody has their own data to care about and everything. And we’re still two years ago we started this discussion like we are sitting kind of like all in the same boat and that we want to work with data and want to build these Knowledge Graph. But the collaboration between the different community members really is really difficult.

So we changed our mission a bit or refined it. So before that it was we put useful data on the web and people download it. And now we have the slogan that we provide global and unified access to Knowledge Graph so that in the end you can collaborate across organization borders. So this collaboration comes in two flavors. So one is the data curation part. So in the end, the data you have, somebody else has the same data and you actually want to curate it together because it’s more cost efficient.

So that’s cause for libraries and public research projects and everything. And the other way of collaboration in the business side is more like the supply chain management. So you want to get data from somewhere else and integrate it into your your product. And this is not this is still not working so well. So now we are changing from the content provision of the free data. We still do this, but we are building like a platform which is called the database, which which can be used to connect Knowledge Graph across organizations and reuse data and provide feedback mechanisms and more reliable supply chains.

To economists, it’s lots of different, different perspectives, interesting common set of technologies, I think one of the things we wanted to do in this panel is give you a little bit of a feel for what it takes, both technically but more importantly, organizationally, to to build these Knowledge Graph. So maybe I’ll start with you, Pontoise. How what is like the the key thing you need to do to get started with building these Knowledge Graph from an organizational perspective?

Why so define a case that is quite specific, so you need to avoid. I mean, it’s good for for selling to clients, but you need to avoid type things and try to do things you don’t want to do just because it’s trendy, but because I think Katarina mentioned how she got buy in from BP or from upper management. So for upper management, it makes sense only if it makes money. So you need to find a particular sub case that will get you a quick win and proof of concept that will show the viability.

And we buy you resources to continue. And because, as I like to say, Knowledge Graph is not one single project. It’s not. It’s first of all, it’s not just the artifact. You build the artifact that you built, all the process around it to support it, to support its its lifecycle. If you only build the artifact and you leave it like that, it will it will die in a while because all the knowledge will be obsolete.

So start with why you want to build a Knowledge Graph be very specific and then and try to find actually allies find it doesn’t matter so much the technology. I mean, it matters after how matters after the what. For me, that’s fantastic.

Anybody else have a comment. I know we heard a great talk from you on the way in and I’m in Thomson Reuters definitive. What’s the why then?

Well, the wife for the metadata registries, as I said, because we needed some somewhere to register all our data sets and the consumers be able to find them, but also to understand how can they use them. So we had a very specific case for the bolt from where we provide the data and RDF. Again, as I said, my colleague will explain more, but it’s just another format that can be automatically integrated. So it depends on the product.

Fantastic. So can we dove into. So we’ve talked about a little bit about the why is the important part of why you shouldn’t build these Knowledge Graph in the first place? I mean, is it interested in what do you see as your current challenges in your area? Knowledge Graph construction? And maybe we’ll start with you, because you know where you know, you have your 20 concepts or so and you’re building it out and you shown performance. So what what challenges do you have going forward?

The challenge I mean, currently we have now hundreds of concepts and 20 are the ones that are driving revenue at the moment there. I think there are a lot of challenges. One is that actually what would be the next use case to do? Because there are quite a lot of use cases like I would love to read DPD and do the Beyoncé in Franz from there, but then how many chances do we have in our search? I’m not getting endorsement for that use case necessarily.

So so this kind of capabilities that we could build, we do need a strong enough big enough use case to actually start investing in it and start investing like developer time to it. So that’s it’s like really organizational business driven, driven decisions when are a company that makes money for stakeholders. And so, um, but the. Yeah, I would say that maybe the other colleagues have more. There are others as well. Yeah. The organizational business reasoning part.

And Sebastian, from DPD point of view, what’s, what’s really challenging you as you move to this Data Busse view of the challenges, the cost. Right. So Sonando can carry the cost. Thomson Reuters can carry the cost. You say we make a two year project, build a Knowledge Graph course. The rewards are quite good, but if you don’t have the resources, then data quality, for example, is really, really bad. It has a bad curve because it follows the law of diminishing returns.

So you increase the you have to increase the manpower, but you cannot. The the increase in data quality or quantity doesn’t increase with the same scale. Right. So you invest more and more resources and then you add only five percent of data quality, for example. So there it really makes sense to pull across across organizations. Right, because especially in pre competitive data. So, for example, a list of singers, list of TV shows, a list of authors, list of publications.

So this is all publicly available information and you should outsource the maintenance of it like work together with other people. Because unless you have of course, if you’re a very big company like Google, you can do it in-house and create it right. But you need to we need to break down the cost and make this this commodity data cheap. And that’s why we need something like a synchronization mechanism across the organizations.

So are each of your Knowledge Graph kind of riffing off the consuming public data, you said you want to consume public data eventually, maybe in the future, but is in text. Colonel, are you consuming public data sources to help IOTA Knowledge Graph?

We do consume them, but not in a live version. So when every time we want to make an enrichment, we look for any type of resources that may contain the knowledge that is going to be the PPTA. It can be the that dimension WHISSTOCK. It can be other Web resources. And but usually one problem is the cognitive semantics. So what we mean as a skill, for example, can be seen in the media in different things. For example, can be we can take the legal domains.

That’s one thing we want or the types of medical areas, things like that. So we need to do mappings. And this doesn’t make sense to be life. It’s it’s a one off project every time. That’s how it works. And so far, we don’t have any incentive to keep alive links to the other or maybe no, we have only Tarasco because Eskow is going to be used by and the employment agencies of its country so that we do have incentive because we want to be interoperable with them.

OK, do you have any comment on that or just yeah, OK, I’m thirsty. Let’s see, I had one or two more questions. I had some very interesting questions. All of you use GitHub to manage your your ontologies and vocabularies or get Neo4j. So you use GitHub manager ontologies, DPD as well. The DPP. It’s a good practice. OK, interesting. Very interesting. We did that as well. And also here, I wanted to open it up for questions, so I have one more questions.

And if you have questions of mine, just raise your hand and we’ll we’ll run around the audience to start getting some questions. But before we do, before we get there, this is technical. Let’s talk about rules and inference. So a big and interesting thing is where do you see the role of inference? Are you using rules inside your Knowledge Graph or is it just purely an entity kind of relation, kind of style Knowledge Graph?

OK, I have been working in the industry for many years. I’m not working as a researcher or university, so the open work assumption was a barrier for us. So the latest years has come to my rescue. And we’re using SPARQL for describing how ontologies, but also we are using it as a rule engine to create information in a controlled way. So, for example, if I want to discover if I want to create new information, if I was using inferences, it would be uncontrollable.

All the information that you can get based on our axioms. And this is not something that can work in the industry. You want to create and be totally in control of the new information you will have in your store. So we actually have worked with rules, but also in my previous job in a top quadrant as Semantic Solutions architect, we were using spin rooms. Now we’re using chacal rules in in reformative. So in general, we don’t use inference anymore.

OK, interesting.

Inference rules like the ones that come from RDF and the Órale actually not supported by Amazon left you right, which is also very interesting. Maybe it’s there for a reason. I remember that back when it was Bleys, Kraft, the place Kraft developers weren’t really into RDF his domain and range at all and also talking like not seeing the benefits there. We are also not implementing them. We’re not making making use of them. I was considering them in the beginning and open world assumption was really hard for my colleagues to understand and to work with.

So now this like example of us reducing latency has been a very practical set of rules that are application specific that we maintain. And so we’re not using the kind of rules language to define those, although we could write that we are just using other other scripts really to do it.

Interesting. Any bonus?

So about the open world assumption also in case doesn’t work, we don’t want to have this kind of inferences. It’s not this kind of case. We mostly like want constraints as well as with respect to the standard inferences that all our RDF gives. Again, we don’t implemented in the sense that every application has its own peculiarities. And to give you an example, when we make a search, when we want to expand the search query, one could say that if you are looking for a term for a concept, then all its more specific concepts should be expansive, expand all its synonyms and expand.

That’s not always the case. Why? Because, for example, some of the synonyms are just too ambiguous or two that really cause a problem. Instead, instead of helping us or you really don’t want to when you are looking, for example, for someone who knows how to, they could learn as a tool kit. You don’t want something. It’s very specific. So you don’t want to generalize with someone who knows machine learning or something else like that.

So we any type of inference is incorporated into the, um, applications and products.

Interesting. Thanks a lot. Hey, and so that brings up our first question over there. So follow up with that.

It strikes me that your Knowledge Graph is a relatively mature year over the kind of the the hump of critical mass adoption. If I was to put you into an organization where they had no Knowledge Graph or what steps would you take and what tools would you be using?

With only one experience of that, once it’s Orlando and having learned a lot on the way, the first thing probably is to to really make it really specific of what the company needs most in that current time, just to make sure we’re talking about use cases all the time. And I guess we’re talking about why he was started with why, but really for that specific company, first, be very clear on the way home and then technically whatever is needed.

So so sometimes property grafts might do the trick more than RDF, Cravath or not even RDF, but something else also. So the next may make that choice, but it really starts with the way.

Yeah, and in general, you should see the Knowledge Graph not as I mean, OK, there are cases where it is, but normally you can use the Knowledge Graph not as a replacement of your current infrastructure, but as a valuable addition. So it helps to keep like the knowledge and ontologies and mappings maybe and but keep the normal infrastructure itself. You can even compile ontologies to Java, for example. So you have a real performance gain there. But it’s good to manage this semantic layer separately.

And then how you achieve this is that you pick certain use cases which are very interesting, and you build this parallel infrastructure in a prototype and show the value and that gets kind of like the attention. So so in the end, this because the GPL was kind of like the semantic access to Wikipedia and it showed that the same edification really brought benefits. Right. And that’s also one of the reasons why Vickki data is there now, because it was a it was 10 years long.

It was a good showcase. And then they finally came around to making wiki data. Right. So that’s making the prototype. Showing the value in parallel to the existing infrastructure is a low cost investment, actually.

So the answer to the question is, you need to be a detective, you need to be an investigator. I mean, just go around and talk to people. You realize that many teams will be already using some type of ontologies that they don’t call it like that. It can be a simple file with some keywords. It can be an example that contains a relation that is not named. But there’s a relation there. And you also realize that when you talk to another team, they have already also using the same knowledge, but not the other teams.

So they use their own version. So it’s an investigative process that unfortunately doesn’t end in two years already in the company. And I still get surprises about hidden knowledge, about how auditors use different terminology, what what people used to call a synonym that is not a synonym. It’s a it’s a hard work, but you need to do it.

So to summarize, I know you’re why find a showcase and be Sherlock Holmes, right. OK, so do you have other questions from the audience?

If I may add this question? I have seen I have witnessed trends in the different industries, so I have seen oil and gas industry companies using semantic web just because they want to be compliant with ISO fifty nine to six, which is a very important standard in this industry. So this is where you should start if you are in that domain. Pharma and life sciences have for a lot of years now have fully developed ontologies like Sanomat or Meche. So they’re usually pharma and life sciences companies.

What they do is they taking these huge ontologies. They slice and dice it because it’s it’s RDF and it’s very easy to do it. They gather in their Knowledge Graph their own information around it, and then they are using these parts of the ontologies for machine learning techniques. So there are other industries like consumer goods where they are using the Knowledge Graph because they want to capture compliance. So, for example, I have a product, but this product is consist consist for many materials and molecules.

I cannot ship it to one country because one molecule is permitted to one country, but not permitted to the other. All this flexibility that RDF offers can help you describe all these things. And also in banks, I have been in projects where they are using graphs for linage. So regulation compliance as well. For example, I have a value in my report where I will transmit to for A14, for example, in the US. So what did this while doing this report came from?

Do I need to hire 1000 consultants to find out where this value came from? No. If I have a linage and RDF linage and I know where this value was affected upstream, or if I change this value in an asset, what is affects downstream? So there are some prominent use cases now.

Fantastic. Thank you. Other questions I am interested in using a building a Knowledge Graph for use case of master data management. And one of the a few of the important things there is being able to keep track of data provenance as data by temporality, that kind of thing. And neither the RDF standards nor any vendors out there I am aware of sort of handle this sort of natively. And I was wondering if there are any insights into how to manage that sort of stuff.

So managing lineage of your councils, managing for provenance, there is a very nice standard, it’s the proper ontology. We use it in a definitive but I have used it in many products for lineage and reference data. You’re pretty much right. You need a data governance tool. And actually these technologies are a very well candidate for data management tool where you will need to have notification. Every time something changes, people and teams need to be notified. You need to track history so it’s better to manage the reference data sets or master entities with a nice data governance tool.

Building knowledge graphs in the real world | Panel Discussion

Connected Data World 2021 All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP

Building knowledge graphs in the real world | Panel Discussion

Connected Data World 2021 All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington LaneLambeth, London - SE11 5DP

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP