Connecting Data, People and Ideas since 2016.
27 May 2021

"Adding more context to machine learning, using taxonomies and knowledge graphs" Masterclass

A talk by Ashleigh Faith


Director of Platform Knowledge Graph and Semantic Search, EBSCO




Slides available here


Machine learning is only as good as the data in which it is trained on, or the assets it is used to enrich. For categorical and named entities, we tend to use many of the same open resources for our models but there are problems with this.


What’s more, if we use unstructured text to train our models, we have to rely on methods that tend to strip out context like bag of words, co-occurrence and clustering.


Using knowledge modeling techniques, like taxonomies, ontologies, and knowledge graph, you can either retain the context during your extraction and feature process, or use these models to add the context to your ML and analytics.


Knowledge graphs are mostly used for processes and trend analysis such as fraud detection, data mesh, customer 360, supply chain and cyber security, to name a few. But knowledge graphs are also rich resources for contextual ML such as search, autoclassification, disambiguation, and data catalogs.

Join me in this Masterclass to walk through the methods you can use to 1. Assess open data sources for contextual modeling, and 2. Harness context of data through knowledge modeling which can be used in ML use cases.


Both assessment and modeling will use machine learning categorical data for examples.


Key Topics


  • Knowledge graphs in machine learning
  • Categorical and named entity data
  • Semantic search
  • Taxonomy
  • Ontology
  • Responsible AI


Target Audience


  • Modelers, data engineers, data strategists, analysts, information architects, taxonomists, ontologists
  • Marketers with a data-centric approach
  • Content creators and data practitioners who are (expected to be) involved in developing and/or enriching knowledge models




Get hands-on experience:


Assessing external and internal categorical and named entity data sources for machine leaning application


Modeling structured and unstructured categorical data as a taxonomy and then adding semantic connections, to create a contextual model to help with your machine learning use cases. Example use cases are:


Semantic search

Autoclassification and feature extraction


Data catalog terminology alignment

Digital Asset Management modeling


Session outline:


  • Presentation and Discussion:

    • Why does context matter in machine learning and how can knowledge modeling help?

    • Which knowledge model, and for what purpose?

      • Taxonomies, ontologies, or knowledge graphs? The secret might be you don’t have to choose, if you use these tips
  • Interactive Activity:

    • Adding or retaining context in ML

      • How to assess open data sources- walkthrough of medical subject headings (MeSH), Wikidata, WordNet, and Getty Vocabularies, to name a few.

      • Take the gathered data and morph it into a knowledge model, taxonomy to knowledge graph

      • Using web protégé (you will need a web protégé account- its free)

*Note, if you have used Protégé before and found it daunting, don’t worry. This class will show you shortcuts and tricks so it is easier to use, and how you can use this modeling in your preferred modeling tools

  • Conclusion

    • Take-Aways

    • Open questions




This class will be highly collaborative and interactive so please come prepared to discuss.


The first section will focus on reviewing the problem and methods and the second section will be focused on walking through the method in practice.

Participants will walk through each sample data source as a group and discuss the assessment criteria, and how to interpret that criteria for your own use case.


Next, we will walk through taking sample data and creating a taxonomy and then adding relationships to start a knowledge model. We will do this together using a shared model on WebProtege.


Lastly, we will discuss how you can implement models like this in your ML models and pipelines.




Beginner to Intermediate


Prerequisite Knowledge


Sample the videos below, totalling 2.5 hours of watch time if each were to be watched from start to finish, but you shouldn’t need to review them all nor should you feel the need to watch from start to finish.






Connected Data World 2021  All Rights Reserved.

Connected Data is a trading name of Neural Alpha LTD.

Edinburgh House - 170 Kennington Lane
Lambeth, London - SE11 5DP