Big Data has transformed the world big time. It led many companies to strongly focus on data analytics, trying to collect and control gigantic amounts of data. After years in the rat race, several of them are slowly realizing that the continuous striving for having more data than others is maybe not the most meaningful business objective for everyone.
In fact, the data collection craze is steadily killing innovation.Post-Big Data thinking in which data is controlled again by people, outlining the goals and ambitions of the Solid project.
In the latest episode of the Connected Data Podcast we present you Ruben Verborgh, discussing Solid, the technology Sir Tim Berners-Lee is pioneering to re-orient the web to its original vision.
How do Javascript and distributed querying make linked data usable? Ruben Verborgh has answers.
Ruben is a professor of Semantic Web technology at IDLab, Ghent University, and a research affiliate at the Decentralized Information Group of CSAIL at MIT. He acts as a technology advocate for Inrupt & the Solid ecosystem of apps, and has co-authored 2 books and numerous publications. He aims to build a more intelligent generation of clients for a decentralized Web at the intersection of Linked Data and hypermedia-driven Web APIs.
Check Rubenโs presentation here:
You can listen to the Podcast below:
โ Transcript โ
Connected Data London โ Welcome to the Connected Data London podcast. Brought to you by the connected data London team. Connected data London is the leading conference for those who use the relationships, meaning and context and data to achieve great things. We have been connecting data people in India since twenty sixteen. We focus on knowledge, graphs, linked data and semantic technology graphs, databases and machine learning, technology use cases and educational resources. Big data has transformed the world big time. It led many companies to strongly focus on data analytics, trying to collect and control gigantic amounts of data.
After years in the rat race, several of them are slowly realizing that the continuous striving for having more data than others is maybe not the most meaningful business objective for everyone. In fact, the data collection craze is steadily killing innovation. In this talk, Ruben Verborgh discusses post big data thinking in which data is controlled again by people outlining the goals and ambitions of the Solid project.
George Anadiotis โ OK, so last part of the day and happy to see you, you all made it in one piece and well, this is going to be super interesting.
Iโm really happy that Ruben Verborgh is here today with us. Heโs well, one of those people who wear many hats. Heโs a researcher at the university and also affiliated researcher at M.I.T. and also a technology advocate for enwrapped. Then I guess this last bit may be the most relevant today in a way, because heโs going to be talking about decentralization, solid link data and how all that stuff somehow comes together. So on to youโฆ
Ruben Verborgh โ Big Data, we all know it and I generally dislike it because I think itโs killing innovation.
The good news is that connected data can help us bring back that innovation that weโre missing today. Iโll talk about Solidand Solidas a way of redefining the relationship between people, companies, services and data. What weโre going to do is move data closer to people because thatโs good for many reasons, not just for privacy and control, but also for innovation, for doing meaningful things with data. So letโs have a look at how this works. Iโll be talking about three things today.
Iโll start with this redefining of that relationship between apps, data and people. And then Iโll go to the solid ecosystem explaining what the parts are, how it works, how you can use it. And then finally, Iโll zoom in on the technical side, like how it is that this connected data exactly helps realizing this vision. So first, letโs talk a bit about this relation. Now, all of the examples that Iโll be giving will be in the context of social media.
However, Iโm not sending you a social media platform. It is just one example that we all know because we all use it or we all know someone who uses it. But as I give the examples about social media, try to transform them into a completely different kind of domains, because this is a data story and this is a picture that you might be familiar with. They are the walled gardens that we have in social media. But if weโre honest, we have those walled gardens everywhere where there is a vendor lock in, which is, as you know, in quite a lot of places.
What is the essence of the walled gardens? Well, it means that thereโs barriers in between. The idea behind the open web is that we can just interact with anyone we want with any piece of information we want. However, the walled gardens are the result of a wave of innovation that goes against the spirit within the walled gardens. You cannot easily share information among different systems. Simply said you cannot share your Facebook photos with your LinkedIn colleagues. I mean, you can, but you have to either move the data or to move to people.
So you have lots of copies of your data or you have people that are multiple people on multiple networks and so on. So it gets really complicated. And if you really think about it, this doesnโt make sense. Just to give you one example, have you ever tried changing your profile picture? If you do that, you have to change it in every single place that youโre you want to use. And sometimes years later, you still find your old profile picture.
This is a result of data being close to the application and not being close to the person. So this is what we want to change now so that itโs not a platform solid. Itโs not saying like we should all do this differently. I mean, itโs kind of this, but it is foremost about giving choice and telling people that they need more control or more privacy, more innovation, and telling that if you want, you should have the choice to do so.
The sort of ecosystem is about enabling people to use the apps that they want to use while still storing the data wherever they want to store it. And this is not true today because, yes, you can make your applications that you want, but they can only interact with those people who are using the same applications. If youโre not, well, then youโre out of luck. So weโre going to decouple this. Weโre going to put data separately from applications so people can freely choose the apps they want.
And we want that people can control the data and then share it with the apps and people they choose. Sounds very simple and it is conceptually, but it is a concept, but really strong consequences from the main concept behind. So it is something we call a data park. It has different names like data involved and so on. The idea being is that every single one of us, you get your own space, your own place where you can store your data.
Thatโs it. So not storing it with LinkedIn or Facebook or any other if youโre storing it in your place. Conceptually, this is where it starts from. Very simple I.T. with huge consequences. For instance, this is a view that youโre very familiar with the data and if you like, this one consists of thousands of pieces of data. And to date, all of those pieces, they will come from a single place, the social media, they will come from Facebook.
Letโs say. However, if every one of us gets their own data plant, then this view certainly needs to get data from different places. For instance, if this is my social media post, well, then this profile picture is going to be stored in my data. But the text is. Going to be sort of my data part, if you comment on my post, a comment is yours is going to be stored in your data pot and even a piece of data as small as a like if you like my post, if you like it, Cormanโs this like is yours is going to be stored in your data pot and then it gets really crazy because that view can be built up of piece of data that are coming from hundreds or thousands of different data points.
So weโre no longer talking about big data here. Weโre talking about a big a huge number of very small data sets and they are connected. So totally different way of thinking. What kind of consequences does this have? Well, on the left hand side, you have todayโs landscape. And today data and apps are tightly coupled with each other. Theyโre inseparable. So on Facebook, I have my contact list. I have my pictures of my agenda. LinkedIn has contacts and Doodle has my agenda as well.
Thereโs no synchronization. So if I accepted the event on Facebook, then do not know about it. If I have a new colleague on Facebook, wouldnโt know about it. Maybe thatโs a good thing. But the thing is, even if you want, we still can do that. And this means that we have this constant duplication as well. Like we always have to keep things in sync because the data resides with the applications and this is what creates those data silos.
And again, itโs not just social media. Itโs basically every app that we use will try to hold onto our data because thatโs just how they work. On the right hand side, you see the consequence of what happens when indeed we all store our own data in our data files and those rectangles that you see into the same apps, except the data has been pushed out. So the data is now and my personal data field. And this means that I can start with one application work on data.
I donโt I can continue with the next. And thereโs no need for synchronization because my data is only in one place in my place. And this makes a lot of sense because why would we want to maintain all this? I mean, this, if you think about it, is how it should have been all along. Weโve been ready. Technically, it also means that we can make different choices. The left hand side, if we want to interact, we need to use the same apps, the same systems.
On the right hand side. I can use one app, you can use another app. And it still works because the integration happens not through the application, but it goes through the data points. And this is where it gets really interesting because the left hand side is a world in which there is only one app. The app is built for the average user, like the Facebook app, for instance. Itโs built for the average user, which doesnโt exist on the right hand side because you have a choice of apps.
You can have more specific apps like everyone can use their own thing, basically, and it is to collaborate together. So this is a very fundamental change. This, of course, is great for people because people get choice and you donโt have the choice today. However, it is also great for companies and for innovation. Let me explain that a bit more right now. Again, the left hand side is today. This is what we have. We have a single market for centralized apps.
So you get this competition thatโs not based on quality of service, but data ownership into competition. There is who has the most data, which is great. If youโre one of those having more data, however, it kills innovation. Let me give you two examples. On the one hand, think about Facebook. Facebook is a major company. How do they innovate? Well, let me ask you, who can name me one innovation to your personal news feeds in the past ten years, which is an eternity in technology, but Facebook has done for you.
You might not notice, but they have done one major renovation instead of just doing like, you know, five different emotions, but thatโs about it. Facebook doesnโt innovate. They donโt have to because theyโre in the data collection business. Thatโs a kind of game that theyโre playing. Fair enough. However, kills innovation even worse. Second example, suppose that someone has a revolutionary idea. They have a new idea on how to present information for social feats, build a new concept to bet they cannot enter the market because they donโt have the data.
So not only are the big companies not innovate on that front, those who want to innovate can because they depend on data for doing so. On the right hand side, we get a new world that we would have if people keep data closer to themselves, because then you get a competition on two levels, on the one hand to competition between those offering data as a service places. It gives you data voluntarily, can store your own data was up on your hands.
You get a separate competition between apps. And this is where it gets interesting because again, the left hand side is where you have the one for the average user. This means that grandmothers and grandchildren are literally using the same app because it gives the best experience. No, because the data is on there. The right hand side is a market where grandmother and grandchild have their own apps, but considered direct because data has been this couple from from the left hand side is competition.
But thereโs one winner. The right hand side is a competition as to competitions, but many more winners because everyone can pick the deals that are best for them. And the deals picked for data and for apps are different. So this is where the innovation will happen. This was a concept on how things work, so the separation between apps and data. Now letโs have a look on how this concretely translates into technology. Weโll talk about the sort of ecosystem.
So Solidis not a company, not a platform. It is foremost a set of standards that make this work together. If you compare it to anything, compare it to the Web. The web is also an open platform with standards. And you can choose your browser. You can choose whatever software you use to make websites. And they all work together. You can view a website on mobile, on desktop, on laptops and so on, so forth.
And I still work. So this is what weโre talking about and this is not a coincidence. So it is started by Tim Berners Lee, who invented the World Wide Web. And basically what he wants to do is bring back the original principles, bring back this open, has this permission less innovation where we donโt depend on companies harvesting as much data because it kills innovation so that itโs more than just software. Itโs an ecosystem. Itโs a movement. Itโs a community.
Itโs an ecosystem in the sense that itโs based on standards. So this is the key thing, the contracts that anyone can implement. Itโs a movement in the sense that we want to build apps in different ways. Weโll talk more about that later, because thereโs a really different approach to building apps and also community in the sense that thereโs different people with different backgrounds, different companies, different organizations who want to contribute to solve it. And the important thing is that anyone can build or hosted software for Solid because itโs not a platform.
If you implement the standards, well, then you can participate. For instance, you can have your own server at home or in your workplace. You donโt have to connect to anything else. You can run server space or you can use one of the three servers basically having a sort of data port. Itโs like having a website you can self host, someone else can host. You can also add a small or large company. It is your choice.
So Iโm not telling people you should have your own server. Iโm telling people you need choice. If you want your own server, you can. If you donโt want, thatโs fine. I donโt care what is a solid server. The server access the data part that will store and got your data. It is by all means a regular web server. So weโre using the existing web stack. We are just a couple of layers on top. Thereโs access control and thereโs linked data.
The server itself is application agnostic, which is really important. It means that you can build any application and that all application specific logic resides in clients. It is like a website again, just like a website can be opened with any browser. A solid part can be viewed with any app. Whatโs in the data part? Well, like a website, you name its profile, pictures, comments, lagโs, whatever, whatever you want to put in it, you can put on it.
It is a website. Your imagination is your only limit. Basically what our clients want clients are browser apps or native apps that read from or the data to your part. You give apps permission and you can be very specific about what they get to access. You can say thereโs documents, you can see thereโs documents you cannot see. You can get friends, permission, friends can give you permission and so on, the things you already know. But this time you control whatever happens and itโs up to apps to deliver a unified experience.
So the views that we saw in the beginning, I mean, if you use an app, you should notice a difference whether itโs Solid or not, except that when you see an app and the reality data will come from many different places, instead of coming from one single background, what kind of apps can you build? Again, this is seem like the web. If you can envision it, you can build it. Itโs not a totally different platform.
You just building on top of the web. The web is a platform. However, the difference is that if you start building, you donโt need data, because right now, if youโre a startup, the first goal you have is how are we going to get peopleโs data? Because without data, you can build apps here. You donโt have to worry. People comment or data. They will give you permission to see specific parts and then you can start showing it and interacting with them and so on.
So itโs all based on features. Itโs not based on data harvesting. Now, this is nice in theory, but does it work in practice? Yes, it does. However, itโs not made for end users yet. So this is in the prototyping stage and development stage. Weโre trying our technology to see how it works and so you can use it if youโre brave. Thatโs very rough around the edges. That said, there are several implementations of solid servers.
You can store your data already in a data part, my data stored in one. Thereโs some free storage available or you can deploy your own as you wish to applies. Thereโs a couple of them. Thereโs data browsers, thatโs context and photos, apps and so on. And if you donโt find what you want, you can build it yourself because thatโs a good thing. Again, you donโt need data to build something. Thereโs also several libraries that help you build your own sort apps so you can try it out if you want.
But warning. Very rough around the edges, though. Now, how come? Well, it started as a research project at MIT and there we have the science, the initial protocols and standards. And so one of the prototypes as well. And those are what we have today. However, in computer science research, the problem is once youโve solved the technical challenges, you have to move on to the next challenge. But that way you donโt get it to the end user.
So Timorously is currently behind a startup called Inrupt. And what do you want to do, really, is to show the world that you can have a healthy company based on data without doing data harvesting. So heโs trying to lead by example shown as possible to do nice things on the web without doing data harvesting and Inrupts is contributing to open source ecosystem with implementations, helping with the standards and so on. And they also have a couple of products down the line.
However, itโs very important that Solid and Inrupt are very different things and is trying to be one company in a bigger ecosystem or letโs say differently. If Incorrupt is the only company, then they have lost because it is all about competition. So what weโre trying to do is open up the market to different players because the more people start doing this, the more beneficial it is for everyone, really. So think of one of as one of the players that is trying to open up the ecosystem for everyone is talking about this new way of thinking about it.
And people Iโve shown you what ecosystem looks like. Now, letโs have a closer look at a deeper place like how does it work on a lower level? Because there must be some magic technologies behind this. Of course, thereโs never magic, but thereโs a couple of interesting things to be done there, because the main difference is that decentralized apps, they will have many backhands and those backhands need to work with many apps. On the left hand side, you have to the situation again where one backend, one app, you know how it works.
And this is really easy because you donโt exactly what the contract is. And if you donโt have a contract, you can change it because often the back into the front end teams, they work closely together. So this is about as easy as it gets from the right hand side. We have the world that weโre preparing for, which is WANGBERG data pulse interact with different apps. Apps interact with different data pods, and one app has to reach data that has been written by another app and you have to have this handoff.
So what we get is a really complex way of of interaction, basically, because you cannot predict anymore what the data pump will look like. And this means that we will have completely have to completely change the way that we think about building apps, because this is what it looks like right now. You have application specific logic that goes to specific server interface and you heart gote interactions. So every client has to be http interaction code to go to a specific server, which means that if you have servers, thatโs kind of the same thing, but not quite.
They donโt work. They are incompatible. But this does not work for for two reasons. First of all, you cannot make assumptions about what the server looks like because everyone can do their own thing. And second, you donโt even know to which server you need to go because data can be distributed across one ten, one hundred a thousand different servers. So you cannot hardgoods because you simply donโt know where data will be monetized. So we have to evolve towards a different way of building apps.
And the way that I think about it is a query based contract. So instead of in your application specific logic, hard coding, what you want you express in a query like this is the shape of the data that Iโm looking for. Thatโs all you do. And the interesting thing is that such a query captures what you want without being specific to a certain backgrounds or a certain network. This also means that you need a query with universal semantics where your query will need to have a meaning that is the same across all data fields in the entire world.
So things like craftsy wonโt work because one graphical query has different meanings, depending on who youโre asking. So weโre thinking about this global space where a query has meaning and the same meaning for everyone. So this is how you build apps only using queries so you donโt use HCB requests because those will be different every day. And then itโs up to a query engine on the client site to translate those requests and those queries into concrete requests to interfaces on different data paths.
Now, thereโs one problem with this picture and this. It depends on a really strong query engine that will do all the magic, of course. But the good news is thereโs engine is reusable because different applications, because this is all about making sure that the application specific Goetze does not have to change the contract, be built against our queries. And if interfaces change, thatโs all fine because itโs only the translation step that has to change. The application itself does not need to adapt.
Now, of course, this comes with huge interoperability. And we solve them with linked data, and our first challenge is if your store or on data, how do we connect it to other peopleโs data and how can apps share data without having to first agree on everything? And finally, how do we integrate data from multiple data points? Well, the answer to all of these things for me is already if the first question, how do how do we connect it?
Well, using links, for instance, this is me liking connected data. London, 2019. And guess what? There was no lag between the website, but I liked it anyway. And this is what I can do if Iโm in control of my data. I just made a link to connect the data London and I liked it. Thatโs this is my like. So this is how it works. Connecting to things about how the center operates. This is a complex topic that needs more than just two minutes to explain.
But the bottom line is this. Weโre basing it on ships. So instead of trading applications to know exactly what to expect, we train them on specific ships so applications know how to recognize alike, but they can also ignore all the things that they donโt understand. So itโs based on shape matching the query engine thing. And finally, how do we integrate data from many different sources? Well, this is easy because itโs already you can just merge the grass.
And this is the reason why Iโm insisting on universal semantics, because weโll need to combine data from random places in the world. And the need to work together just means that you need a consistent semantics and consistent meaning across all those possible data points. This also means that we need to do something for developers because this completely changes the way that theyโre used to developing things. So I think this is the most crucial success factor that weโll need from the developers are the ones that built up people see and forget about you, Xs and OS, theyโre the ones being sold to people.
We should not make the mistake as solid or as Inrupt to say, look, weโre going to bring it all the way to the end user. What we should be doing is thinking in terms of enabling others. So we want to enable developers to build things on top of it. So rather than doing it ourselves, we think about how do we make it better for them? And the interesting thing is that if we have developers build better apps, we also help ourselves doing it.
And this is something I noticed and I played a bit later. And this to me is a crucial point because youโll have lots of things about RDF being complex. And a very convincing argument, in fact, was made this morning. However, no, Libby Miller had a beautiful quote saying that people think art is a pain because it is complicated, but the truth is even worse. It is basically simplistic, but allows you to work with real world data and problems that are horribly complicated.
And this is a horribly complicated problem because data spreads all over the world. How can you fix this? By giving people tools?
For instance, I myself have been working with simple react based tools that give developer access to this whole world. So this is just a componentโs way of building and doesnโt look too special. Doesnโt this looks quite regular. However, letโs have a close look at these things. Theyโre microexpression their queries. And those are queries, not just one data source, but their queries for the full rep.
I call this the flex language for link data and flexible. And it works like this. So you can do things like data that user name data, the user, the trends and so on and so forth. But this one is interesting because the first name of those friends are their profile picture is not coming from the userโs data space. Itโs coming from their data spaces. So this is a forward federated query going out to do negative things like their first names and even just have a random Euroland there.
And for instance, this one will lead you to my block. So these are expressions that go to the entire Web instead of just one local data source. And how this works on the low level, itโs just regular JavaScript, but it feels like a local object. Itโs actually a proxy that will go to the whole web and get data from there. So if you wait, itโs if you go into a asynchronous mode, it will start fetching things and us developers donโt even have to care where it comes from.
So thatโs level of abstraction that weโre talking about. So that if Iโm to you about a new relation between apps and people, Iโve explained how the ecosystem can work and also what underlying technologies are that connect all of this together. However, as a researcher, I have an important statement to make, and that is that in my semantic up community, we often think that problems are solved, like we focus on the heart of such problems, like Federated querying, and that with all the rest will be solved by the engineers.
But the state of the art medical community for Federated queering is letโs take services to basically. Connect them over Ethernet and then we do federated query, however, if you think thatโs to the public web with 100 sources on global connections, but different data models or nice theories donโt change. So to 80 percent that we think of weโre solving, the problem is actually quite small. Bring all of those theories into practice, requires a lot of research and a lot of engineering to make it work at the scale of the web.
Thank you. Thank you them, so Iโm sure thereโs tons of questions on. So I would be wondering what sort of applications do you think shouldnโt be written in this way? All applications and also on the census, like for most US applications, working at personal data because with personal data, we have major problems. It is expensive right now. You have GDPR, you have social pressure to do it and also business factors. But thereโs a couple of application scenarios that are very interesting for this in particular.
And governments and governments are really asking for things like that, especially when the Europeans get so complex. E-health is also a very prominent example because different hospitals have records and so on. So if you can just put them of patients, that makes sense. So expect those issues to be very prominent examples, which would be good examples. What sort of application would you not do this? I think examples where thereโs no real personal data involved, I would I would say so more point-to-point things and so might be less interesting.
But we need to think of concrete examples. How difficult do you think it is to achieve universal agreement on semantics, and to what extent is that going to inhibit innovation that people just canโt agree on what something means? Extremely difficult to the extent of the impossible. However, the thing is, we donโt need full agreement. We can use smaller, decentralized agreements. Thereโs a very small set of consensus about all we need to agree on. But the more issue gets the smaller set of people who actually need to agree.
And the good thing is, because we have the RDF model is also possible if different initiative starts to random after the fact. So the whole complexity of the stack is justified because you donโt want this central agreement, because this is so slow, so easy to rely on decentralized agreement. Hey, this is more of a political question, but how far do you think I mean, do you have a vision of bringing this technology to make the world of the Google and the Facebook change or we just started with it?
This is impossible. And we have to create a parallel track because thereโs no way that these people are going to get it. We have to do it all at the same time. However, be aware that it might seem like weโre the enemy of the big companies that know that even big companies are struggling to take Facebook data biggest, but even Facebook is desperate somehow to acquire more data. So Iโm quite sure that they know that their model today is finance and theyโre desperately looking for alternatives.
They donโt like sample Google. They have lost the social media market now would like to be a provider of data faults in this ecosystem. And then maybe itโs a monopoly of Facebook so easily for big companies. That makes sense to think about these things. I thatโs thatโs very important. So someone is very attractive from the point of view of users controlling your data, but often there are good performance-related reasons to centralize data, how do you how do you speak to those concerns?
Yeah, so first of all, very good points, because if I explain this to people, some of them say, oh, this is nice. Now you have new research to work on, but actually setting the way, setting the rules in this way is so meaningful that we believe it justifies looking for new solutions. So weโll have to look at those things just because the rules are meaningful. However, decentralization does not mean that there can be no centralization or performance optimization.
So, of course, to make a scale caching, you need indexing and so on. The difference being that the source of truth is the individual data falls into caches build up. You can still find the answer. Itโs just to be painfully slow, which is very different from today because itโs essential systems go up, then everything is gone. So yes, we will need massive caching. And so there are only optimizations, which is a good thing.
Heโs going to be the last one, but donโt go away, because thereโs a direct continuation of what would have you done, any thinking about what the sentence might be for you to get into this is a bit more limiting the kind of revenue generation models they might have innovation. Iโll give you a very short example. A couple of months ago, I had a company knocking on my door. They built eight to type people, two jobs, which is really great because the only way to make it algirdas better, which is more data.
And if they need more data, theyโre basically competing directly with LinkedIn. So they knocked on my door saying, look, we donโt want this. We want to harvest data because the moment we do it, weโve lost already. We want such an ecosystem in which we donโt depend on data harvesting. So instead of having 50 percent of our people working this week, 500 percent of people working on innovation and the data we donโt want, thatโs please keep it.
So innovation to me is a driver for for these things. OK, I know we could go on for a while, but weโre going to have to wrap up because thereโs a panel coming up. So sheโs saying Iโm starting to say.
Connected Data London โ We hope you enjoyed the podcast. To get more of our own material and to keep up with the latest industry and research news from our domain, we invite you to connect with us. Connected data, London has an omni channel presence.
Besides all major podcast platforms, YouTube and SlideShare, you can find us on Twitter, LinkedIn, Facebook and Instagram. You can join our meet ups or you can keep up with our news and special offers by joining our mailing list.