Tired of the AI-washing and buzzword dropping? Interested in finding the connections in your data and performing advanced analytics and reasoning?
Then Connected Data may be just what you need. Now all you need is to figure out what it is.
This is what i wrote last year, and these are words i still stand by. Which is why i am inviting you to join us in London in November, this time as a co-organizer of CDL18: Connected Data London 2018. The great community and feedback the event has garnered the last couple of years has been our inspiration, and there’s loads of interesting discussions to be had.
We’d love to have you with us in London, and we’ll be sharing thoughts on the topics we’ll be covering over coming months as well as announcing more speakers as we approach the event. But what better place to start than reacquainting ourselves with what Connected Data is about?
We think Connected Data is a broad church of technologies, roles, skillsets and philosophies – it’s rich and multifaceted. This is exactly why we are building on the success of CDL17’s themes by tweaking them for CDL18. This year’s event incorporates positive feedback from previous attendees and event partners to build on 4 themes : Graph Databases, AI & Machine Learning, Knowledge Graphs & Schema Management and Linked Data & Semantic Publishing.
But isn’t all data by definition connected data anyway? Well, kind of.
Is data in a spreadsheet connected? Yes, when we consider implicit connections such as conceptual links and cross-references between data in different cells, sheets and books. But establishing, maintaining and following these can be awkward. Spreadsheets are sometimes considered the most widely adopted database, but this flexibility comes at a cost in terms of reliability and scalability.. That’s why we have ‘proper’ databases with joins, constraints and other features to manage connections.
The most widely used database is still the relational database. Is data in a relational database connected? Sure they are – all those explicit foreign keys are clearly connections. Establishing, maintaining and following connections in a relational database works much better and faster. But it’s still not ideal for certain connection-centric use cases.
When the number of hops, or joins, one wants to do in a relational database exceeds a certain threshold, things start getting complicated and slow. Writing and executing that kind of queries is not fun. Not within a single database or schema, and certainly not across them. Relational modeling and databases work great for some use cases, less so for others.
Social networks and doing things like finding friends of friends is a typical example, but there’s more. What about a Content Management System (CMS)? That’s essentially like a big web file system, with files and folders and users and groups and permissions etc. And all the relations between them – what belongs where and who accesses what and so on.
That’s what got a certain software architect called Emil hooked on connected data and graphs some 10 years ago. That software architect decided to go for building a new database which he started calling a graph database. Graph databases are optimized for use cases that involve lots of hops, and they are one of the key pillars of Connected Data.
Graph databases greatly facilitate working with connected data. Image: vadis
When I first started working with graph databases in 2005, Emil had not built the graph database leading the market today. Graph databases were not even called by that name, and the term Connected Data was not used either. Only trailblazers and early adopters were around, but over time the ones that endured have been joined by the Microsofts and Amazons of the world.
One of the things CDL18 will cover in the Graph Databases track is precisely this, by yours truly. I will share my experience and try to shed some light on what types of graph database solutions are out there, what you should know about them and how to evaluate which might be right for you. If you want to keep up to date in this field till then, you may also want to check my newsletter.
What is the other hot topic these days? AI and machine learning. While the 2 are often used interchangeably, they are not the same. Let’s just say that machine learning is one of the approaches within AI. ML is getting the lion’s share of attention today, but other approaches based on knowledge representation and reasoning make for a big part of AI too.
Trends come and go, but we believe the work done in other subdomains of AI deserves to be highlighted too. Sir Nigel Shadbolt is a Professor of Artificial Intelligence, and a well known and respected authority on AI, among other things. Sir Nigel will present at CDL18 drawing on his vast experience on collective intelligence, open innovation and linked data.
The inherent structure in graphs can be leveraged in your machine learning algorithms. Image: Tomruen on Wikimedia Commons
We also want to be up to date with the way this technology is used in the real world, which is why we host practitioners from organizations such as the BBC and Babylon Health. Augustine Kwanashie and Mohammad Khodadadi are among those who get to apply real, every day AI at massive scale, and will share their approaches and findings with us.
That’s not all though. We are well aware and appreciative of the value machine learning has brought to help exit the hiatus of the AI winter. We want to give the stage to people who will elaborate on how the structure in graphs and connected data can be leveraged to boost machine learning approaches, and we will be announcing more speakers in the coming period.
Perhaps the connection between graph databases and AI is not obvious. Although there is a progression leading from storing and analyzing data to powering applications that exhibit intelligent behavior based on that data, there’s a lot of steps to be taken in between.
Speaking in terms of analytics, this would enable us to go from descriptive analytics (examining what happened), to prescriptive analytics (making a desirable outcome happen). The intermediate steps are diagnostic analytics (figuring out why something happened), and predictive analytics, which is a way of forecasting what will happen.
Enterprise Knowledge Graphs add metadata and meaning to connected data. Image: Tony Edward on Search Engine Land
This abstraction may help grasp the progression in bite-sized chunks, but what’s connected data got to do with this? In order to utilize vast amounts of data, some form of metadata is needed to describe data properties such as lineage or schema. What you get when you add metadata to connected data is knowledge graphs, and we’ll cover them in depth in CDL18.
Our first knowledge graph speaker is Panos Alexopoulos. Alexopoulos has been working at the intersection of data, semantics, language and software for years, and is leading a team in Textkernel developing a large cross-lingual Knowledge Graph for HR and Recruitment. Alexopoulos will share his experience on crafting a Knowledge Graph strategy in CDL.
What is the world’s most famous knowledge graph? Google’s, without a doubt. Everyone consuming or producing any kind of content on the web is using it, whether they’re aware of it or not. The story of how Google went from ignoring semantics to adopting and using this technology to boost its products is telling.
While not everyone needs to manage data at Google scale, everyone can learn to use the technology Google is using to their benefit. A set of standards and vocabularies brought forward by the Semantic Web have been successfully utilized in what Tim Berners Lee branded as Linked Data. The two most prominent Linked Data use cases are schema.org and JSON-LD.
Linked Data is a set of standards and vocabularies brought forward by Tim Berners. Image: Andreas Blumauer / The Semantic Web Company
JSON-LD is a pragmatic technology that’s available in the here and now. It can greatly benefit existing applications using JSON by adding a layer of semantics with minimum effort, and it also works with schema.org. One of JSON-LD’s architects, Google’s Markus Lanthaler, will be joining CDL18 to share how JSON-LD is utilized in Google and what’s in store for the future.
While the benefits of JSON-LD are rather clear, other parts of the Linked Data such as OWL/RDFS semantics and reasoning have a reputation for being more controversial and less approachable. But a new W3C standard, SHACL, used for describing and validating RDF graphs promises to change things. Dimitris Kontokostas from GeoPhy is one of its creators and will also be telling us all about it.
Does that whet your appetite? We hope so, and we’re already working to bring together even more topics and thought leaders in what we know will be a day to remember. If you’re already sold as we are, you may want to check our early bird prices which we’re keeping open for you till the beginning of May.
In any case, watch this space – there will be more to talk about and great guests to talk with in the coming months.