We’d like to start 2021 on a positive tone. Share our impressions from the 3 days of learning and community building on all things Knowledge Graphs, Graph Databases, Semantic Technology, and Graph AI that Knowledge Connexions has been.
However, it seems that 2021 is picking up from where 2020 left off. So our welcome note will be bitter-sweet at best.
The good news is that Knowledge Connexions, the new, online version of Connected Data London, was a great experience. We learned a lot, we connected with everyone who joined, we had fun. But more on that on a separate note to follow soon.
Today’s note is dedicated to someone special we came to meet via Knowledge Connexions. We did not have to lose Hamlet Batista to appreciate him, but unfortunately that’s the way it is.
They say Hamlet Batista, CEO and founder of RankSense, was a legendary SEO. They say that because of the way he advocated for instant results, and he showcased how to achieve that. Because of the way he spearheahed the use of NLP (Natural Language Processing) in SEO, using Python to do things such as Generate Quality FAQs or Predict Content Success.
His work stood out, and Hamlet was a natural fit for our Workshop on Knowledge Graphs and SEO: The next chapter. We reached out to him, and asked him to join one of the finest groups of experts in this industry we could possibly think of: Dawn Anderson, David Amerland, Jason Barnard, and Andrea Volpini.
Hamlet was responsive and keen to join and share his insights. Interacting with him and the rest of the group in the contect of work to prepare for the workshop was fun and productive. And the workshop itself was a blast.
They say Hamlet Batista, CEO and founder of RankSense, was a legendary SEO. But more importantly, it seems to us that Hamlet Batista was indeed, as they say: Ever encouraging. Ever generous. Ever respectful.
Unfortunately, Hamlet Batista passed away in January 2021, shortly after our encounter and his participation in the Knowledge Connexions Workshop Knowledge Graphs and SEO: The next chapter. He left behind not only a body of work, but more importantly, a family.
Today’s note is a tribute to the memory Hamlet Batista. We have published the recording of Knowledge Graphs and SEO: The next chapter Workshop as a video on YouTube, a podcast on all major platforms, and we are also sharing the transcript here. Everything is openly accessible to the community.
Sharing Hamlet Batista’s work with the world is one thing we can do to honor his legacy. More importantly, however, our thoughts are with his family. Lily Ray, Hamlet’s friend and colleague, has received the go-ahead from his family to start this fundraiser, and will ensure all donations go directly to
We have contributed our fair share to help his family out during this difficult time. Please consider donating as well.
As for our impressions of 2020 and plans for 2021..rest assured, they are well under way. Watch this space.
--- Transcript ---
George Anadiotis: Good morning or good afternoon, everyone. Depending on which part of the world you are joining us from and welcome to Knowledge’s Connections 2020. I am George Anadiotis, part of the core organizing team for the event, which is a joint venture. Coming to you from Connect Data London and the Knowledge Graph Conference.
And this is actually the first session that we have in the whole event. This is the first day of workshops and this is the first workshop in which we have a great team of people who are going to be dissecting all the latest and greatest developments in SEO and Knowledge Graphs with a little sprinkle of natural language processing and AI in the process. So you are up for a really good thing. If this is your thing and obviously it should be, since you are here.
Just a few words, as you know, this is the first session for our event as well. You know, I don’t really have to explain to anyone, I guess, how hard 2020 has been for all of us. We’re trying to make the best of it. And this is traditionally this time of year that in Connect Data London, we have our flagship event.
And so, we didn’t let this this hard year keep us back. And this is our way of of doing it again in the best way we can. So online instead of actually face to face. But, you know, on the upside, this also gives us the chance to have like a dream team together today with us.
So without further ado, I’m going to introduce David Amerland, who is the moderator for this workshop. And just in ways of introduction, I was just telling David earlier that it may be a good thing, actually, to refer a little to how we met, because I think it’s actually relevant for this workshop and it gives an idea of how Knowledge Graphs mean different things to different people and how you can approach it from a different angle.
So as some of you may know one of the hats i wear is a contributor for ZDnet and at some point I had written an article which had to do with knowledge graphs and David read it and I guess it kind of seemed interesting to him. And so he sent me a message basically saying that, well, you know, what you’re saying doesn’t really make much sense.
And I saw it and quickly, quickly checked his background and I realized, okay, so this this person actually knows what he’s talking about, but I also know what I’m talking about, so there must be some kind of misunderstanding here. So he probably means something else when he first reff\ers to Knowledge Graphs than what I refer to when I used the term knowledge graph. So this was the beginning of a wonderful friendship and also a very good example of how semantics come into play when you talk about knowledge graphs in specific, but also more broadly.
So, David, wears also a number of hats here. He’s an author. he’s also into data mining. He does consulting; a man of many talents, including SEO and knowledge graphs, which is why he’s here moderating this workshop. So I’m going to give him the floor. But before I do, one last thing and I’m going to basically I’m going to quietly sit in the background from now on, but just for the benefit of people who are watching.
So, you know, this being an online panel, we thought the best way to design the interaction with you because we want to keep things interactive, but we also don’t want to to break the flow of people having conversation. So my job here will be basically to try and I try to to serve both of these goals.
So while our guests are having their conversation, you can you can type in questions, comments or feedback or, you know, you can also feel free to to chat amongst yourselves as well, using this nice little chat box that you see on the right part of the screen, the bottom right part of the screen. So this stuff looks there, is moderated, you know, just in case people start typing crazy things here. I don’t really believe, but you better be safe than sorry.
And then basically what you can do is, you know, like I said, you can chat with each other, you can type in questions for the people in our panel or feedback or, you know, what have you. And my job will be basically to keep an eye on that. And at certain points, when a discussion theme has been completed, David is going to stop the conversation and converse with me.
And then at that point, I will basically relay questions and feedback and comments to David so that they can be taken up through the panel so that’s it. My talking part basically ends here. and David the floor is yours and you can introduce everyone in the panel.
David Amerland: OK, thank you very much for that introduction and certainly when we talk about knowledge graph, there’s always a lot of confusion about almost 20 years since the semantic web first mention, it’s been about just almost 10 years since knowledge groups started coming into play. And we have now artificial intelligence across the Web. And nobody’s really none the wiser how all these things actually clarify the point of practical application.
Now, we have a brilliant panel. I’m going to introduce it first and then we start with Andrea Volpini. He’s a CEO and founder of Wordlift, a company that helps brands increase the visibility; they make use of semantic web technologies and artificial intelligence to promote that activity. We have Hamlett Batista. He’s a CEO at RankSense and agile SEO platform for online retailers and manufacturers.
We have Dawn Anderson, the managing director of an SEO Consultancy, and Digital Marketing agency, Bertey. And last but not least, Jason Barnard founder of Kalicube, who specializes in brand SEO and knowledge panels. And personally, I’ve seen him run a number of very interesting experiments when it comes to CEO. Well, what we’re going to do is we’re going to start a discussion on this, take a few questions, fill a few questions which have to do with knowledge graphs. We’ll go around the panel, get everybody’s experience. The focus here is on practicality.
And as George has already mentioned, as questions come up, let’s put them in the chat and we’ll address them to the panel.
So let’s start the first one. Now, when we talk about knowledge graphs, what are we talking about actually? What are knowledge graphs. And let’s take this question with Andrea first, please.
Okay. Because a knowledge graph in simple terms, is a data structure and it’s a data structure that connects one node to another node through an edge. And this edge, it’s what creates the information. So the way in which we specify the connection is the way in which we convey information into our graph.
Okay. And that’s that’s a great, almost textbook definition
If we’re talking about the practicalities, though, if you’re the average person is thinking about, OK, I need to improve my SEO, I understand that knowledge graphs are really important now. And I need to grasp what does that mean in terms of what I do? So how would we translate a node and an edge to that person ?
So in SEO terms, then edges would become the property of the entity. And in these properties are part of pipes and each type has its own set of attributes. So, in SEO terms you want to understand what type you are, are you representing yourself as a person? Are you representing yourself as a doctor, as a company? And what type of assets do you want to show of your entity? Because at the end, the CEO is about marketing whatever you do best.
OK. I mean, that’s that’s a very practical example which you just gave,
I read something recently in a book about natural language and knowledge graphs, there’s a term, as we know, which is literally “is a”. So I think Andrea’s alluding to always considering what is a thing “is a” t’s literally is the word “is a” seo whenever we produce content we constantly think about this is “is a”, if that makes sense. So yeah, it seems we mentioned a lot and books on ontology that the word “is a together”. So literally the three letters.
Mm hmm. And I’m very glad you brought this up. And that’s inevitably when we start talking about knowledge graphs we have to grasp, to grapple a little bit of terminology.
Already we’ve got entities and got ontologies coming up. So let’s say next the next sort of demystifying question to Jason. Jason, what is an entity, please?
Oh, nice question. An entity is a thing. It’s something you can define. A person, a place, a road, a house, a dog, or in fact, the concept such as economics. And I think kind of we tend to forget those entities. Economics is an entity. It’s something we can define and we can name. And the idea of entities, I mean, with Andrea of WordLift, we’ve been doing entity based content models. And I hope we’re going to dig into that because it’s incredibly interesting, the idea that when we create our content, if we start thinking in terms of entities and their properties and their attributes and the relationships to other entities, we start to really, really get to the core of what it is. We’re talking about the topics and why, where we are situated within our market and vis a vis our audience.
You mentioned properties when it comes to entities. Can you clarify a little bit how do we define that?
Well, I mean, it was Andrea who said properties earlier on. I was just reading stupidly what he said. I have been using my podcast is an example because that’s part of this experiment. I have a series, a creative work series and the type of series is a podcast series. And within that series I have multiple podcast episodes and the episodes actually have properties which Andrea can now explain.
OK, so let’s go to Andrea on this one and then we’re going to have to go to Hamlet next, please, again, Andrea. Yeah. How you define properties of an entity.
So the properties are, you know, the attributes that specify a set of information. So, for instance, in the case of the podcast, the core property, of course, is the you know, it’s the title of the of the podcast that’s that’s the content that people will look for. But then again, it’s the edge that the conveys the real value behind the podcast. And one of the most important entries is whoever Jason is interviewing. And so the relationship between the two person, the author and the contributor is what’s creating the uniqueness of that piece of content.
And I think it’s interesting that we already moved from the knowledge graph in the context of Google to the knowledge graph in the context of J-SON website. And that’s that’s I think it’s an interesting also jump because I mean, when we talk about knowledge graph, usually people tend to think about Google’s knowledge graph, especially in the context of SEO. But I think now we are mature enough that we also talk about our own knowledge graph. And I think Hamlet also has done a lot of interesting experiment to build knowledge graphs
yeah, I’m sorry Andrea, I thought that’s kind of where David was coming from. He was saying, like, what is the knowledge graph? And has applied to us personally and for us personally. It comes down to WordLift today, which is saying build your internal knowledge graph. Sorry, go ahead.
It’s going to say ultimately knowledge graph as a term has been popularized by Google. I remember last year when I was at the Web conference in San Francisco, one of the Google ontologist was quite runty when he was talking about how actually knowledge graphs per se on the whole semantic web, of the semantic data that has been around a very, very, very long time, ever since time, but as sort of linked, connected data. But obviously, Google massively popularized the whole thing.
So that knowledge graph as a natural thing per se is quite a loose term really, on to your point about attributes. I would I would compare that with classes, classes of things, you know, and that’s the whole thing is, you know, a vehicle is a car is a vehicle and a bike is a vehicle. And I don’t know if that makes sense.
It does it does in a way. And I’m going to take this question out to Hamlet. And one of the things which we brought up and haven’t really dug into is ontologies, and Dawn’s point of classes comes into that .Hamlet, you’re going to talk a little bit about ontologies. What are they how are they created? How do they differ from a taxonomy?
Yeah, I wanted to first contribute to the first conversation, the first interaction. I felt like we were too high level, too abstract in the definition. And I wanted to make a simple layman’s terms explanation of why are knowledge graphs, you know, what is the semantic ontology and all this stuff. Right. And I think with examples. Right. With examples, it’s very easy to understand.
And I want to mention in a specific case, which is try to be tweeting, trying to address ambiguity. Right. So you have, for example, Abraham Lincoln. Right. And you could say, oh, is that the person, right, or is that the bridge right? Or I mean, the tunnel or the university. Right. So the same name. The same label. Right. Could be used to express different things that are completely different.
Right. So how do you the disambiguate, right, and then we get into that and, you know, the uses of natural language and stuff like that, but think of that idea, right? How can you for a human you know that, by the context, you can understand why it is, because you’re using exactly the same label for something that is completely different things and therefore the attributes of that thing come into place. But instead of talking about attributes, let’s talk about what that thing is capable of. Doing, being, or you know, so a person, right, is going to be walking, it’s going to be and that’s what Andreas was talking about the ages. Right.
So when you look at the entity, which is the label. And you look at the capabilities of that label, you can tell whether it is a bridge, whether it’s a university or whether it’s a person, because a bridge is not going to have the same capabilities of a person. Right. That’s how you, in simple layman’s terms, can understand it. Right. So you get from the academics into, well, how do I can understand this now? This object, this attribute, these classes. All that stuff makes a lot more sense.
But but but sometimes even when we’re talking about the same person, understanding the things they do or say or whatever, it’s not always enough because, for instance, we say Mozart, whereas other people use the full name initials. So there is within the actual entity determination itself, of the thing.
yeah, but you’re right, that’s a different problem because it still is the same person.
Yeah. Then there’s the determination.
It doesn’t matter whether you use the abbreviation or anything. It is the same thing. Right. Because if you’re talking about Mozart, it’s going to have a birthday. He’s going to have you know, you are going to be able to tell unambiguously that that’s a person because of the attributes of that person. Right. And it doesn’t matter how we label with abbreviations or first or last name or whatever. You going to know that that’s the person because of all the attributes of that person.
And you’re going to be able to identify things by all the things that are taken out you’re going to you’re able to tell who you are by the by the company that it keeps. Right. So that’s what I say, that that’s a very simple way to understand the whole concept when you’re trying to disambiguate.
David AmerlandAmongst the four of you, you’ve created a very multilayered approach that comes down to what it does. So if you’re creating content, for instance, that needs to be as less ambiguous as possible, as clear as possible, clearly describing what that content or rather the thing which you put in your content actually does, as every one of the panelists has mentioned, will help create that clarity of approach, which helps create a particular entity out of your content so that you have a search engine, knows what you’re referring to, and an audience reading that can clearly understand what you’re actually talking about, something that’s actually a very good way.
And we’ve seen in the discussion between Hamlet and Dawn and Andrea and Jason that they’re on top of all that. There are levels of what we could call perhaps academic interest, because when you dig into an entity at an SEO level, the technicalities are quite immense. Like don’t mention Mozart. And she mentioned that when you actually create the entity and created properly the technical level, there’s a lot of things which you need to actually take into account. Hamlet’s approach that, you know, you just you basically describe what that does, helps you understand how you you connect the high level approach to the practicalities of it.
And I think that’s brilliant. OK, let’s talk a little bit about context now, because essentially context is how search is evolving in a different sort of application, and that also helps the evolution of knowledge graphs. So really, the question, which is slightly broad and hopefully will narrow down in definitions, is how does context now begin to change the applications of knowledge graphs? How does that not affect search engine optimization in the broadest sense? So if we just sort of take it to each person in turn, I’ll start in reverse this time with Hamlet, please.
Hamlet BatistaYeah, yeah, so and we can talk about and we can’t talk about context and search without talking about BERT, you know, BERT is a natural language processing and coding mechanism, right, that allows more granular interpretation of text so you can have a sequence of text and the machine, what it does is it turns that into embeddings, into numbers, once the text is turning to numbers. Machines can do very easily the simplest application is simularities, right? You can take two sentences. How old are you and what is your age? Those are two phrases that are semantically similar.
But if you use a traditional approach of matching text, they wouldn’t be looking like similar. If you’re trying to match word by word, they wouldn’t look similar. But a technique like BERT can identify that these two phrases are very closely related and is able to do that by context. Right. And and BERT walks out of word level. What I describe is sort of sentence level work works on a word level.
And that example that I gave of Abraham Lincoln in the text, BERT will be able to clearly label the person versus the bridge or the university by the other words in the sentence. So you have a full sentence. The word is not going to be using their other words in the sentence. Allow the search in all the this word Abraham Lincoln and this sentence means the man. This one means the tunnel. Right. Because of the other words are in the sentence. Right. And the context, as I said, the words before and after is what makes this available.
BERT is able to check the context, before BERT, it was only checking the words before or the words after BERT is able to check both of them at the same time with a
David AmerlandSo it sort of popularizes a little bit like it essentially is a context sensitizer for search .
Hamlet BatistaExactly. And the embeddings, the BERT embeddings are context sensitive. So essentially it creates and I use an analogy which is using GPS, so which is very simple to understand. Right. So you have a physical location. You can express a location by, you know, the name of the business or the street address where you can use the GPS coordinates to know that that location, it is very similar. Right. So BERT gives a coordinate for every word and the coordinates change by the context. So depending on the context, going to have a different location.Excellent
David AmerlandExcellent. Andrea you can take it from here. If anything, you can add do this.
Andrea VolpiniWell, I mean, the analysis of the context does also take a lot into account, the geographical location of the user. And I think this is quite interesting in the knowledge graph because we see the same entity that can be represented in different ways, depending on where the user is making the search. So if you if you look for my company here in Rome, then you will get the local panel and the direction to to reach the office. But then if you look from where you are abroad, then you will see, you know, the description of a software organisation. So I think that the context is, of course, a mix of the user intent. So what the user is trying to express and in a lot of other parameters that that help Google understand information that it’s more effective at that point in time.
David AmerlandExcellent. Okay, that’s fantastic. And Dawn, anything you need to add to this in terms of context and why I’ve written extensively on BERT, but yeah, it’s such an intriguing topic.
Dawn AndersonI going a bit quicker, but, you know, rooting around the square, etc.. But it seems, though, that there are even bigger and better things. A really interesting one is want T-5 by Google, which is the original BERT was trained on Wikipedia, the Wikipedia for the protocol, because I think it was a book corpus. So a lot of words. But Google’s T5 , which is just kind of a more recent model, an extension of BERT that was trained on the Common Core, which is petabytes of data going back seven years and much more aligned with the natural language, and obviously all of these things are continually trained on questions and answer natural questions. So BERT’s great, but that’s really just the beginning of things to come, you know, especially when it comes to things like internationalization of search as well, because we all know about programmers, English, the whole web, and the fact that English is far more popular than the number of people who are English. Yes.
And that’s a problem because it presents a massive bias . So in reality, now, the big focus, I would say, going into this this year really; is i know Microsoft, Facebook and Google are doing a lot of stuff with these BERT’s models to translate it into lots and lots of different languages, for instance, without having to rely on English as an intermediate translator. So I think it’s going to be a huge amount of the stuff that will go well beyond that in the very, very near future.
David AmerlandLet’s get to Jason and Jason. Please tell us a little bit again about context. Yeah.
Jason BarnardWhat we what struck me when of that went through something is coming back to what Hamlet was saying earlier on is however good they get, they’re never going to be able to sort out the fact that we as humans are incredibly confused in our own brains and we have ambiguity thanks into our system and we can’t help it.
I mean, I’m going to give you a couple of examples. And it’s something that I’m struggling with on multiple fronts.
But a company whose software has the same name as the company, that’s immediately difficult for them to disambiguate.
And then the company will create a company profile, but talk about the software or have a software profile and talk about the company and start however good all of this gets, the people who are actually putting this information out there and making it confusing on their own because it’s confused in their own brain. And when you say to you, I mean, I’ve got to do this. I’ve got a few clients actually who have software and company I’ve known. And when I spoke to them, when you say that, do you mean the company or the software? They can’t actually tell me. They have to think about it. And that’s that’s a problem. And we’re talking a really expensive problem about it is as human beings, we don’t have much imagination.
That’s a problem from this point of view. And coming back to another example that I’m working on with my experiments that I’m doing on the knowledge graph, Google’s knowledge graph in particular I created with my wife years ago, that characters we named a song after them because they have to have a theme tune, then we know the movie after them.
Then we named a TV series also called Boomerang Koala. Then the song is in French, the song is in English.
So we have at least six entities with exactly the same name. And the context of each one of them, I’ve actually created a page for each on my site and even on the pages on my site, when I’m making an effort to make that context incredibly clear, it still isn’t clear.
And I read it back. I wrote them three months ago. I read them back this morning. Still rubbish. And so as a human being, I’m having trouble disambiguate and expressing clearly which one of those six incredibly, very, very closely related entities that have the same name are they have the same label as Hamlet was saying earlier on. That label is exactly the same for each and every one, and I’m incapable of being clear about that.
David AmerlandThat’s awesome. And again, for the audience and I’m going to come back to each of you, basically what we are putting together here is a very complex picture, which shows why, CEO and search so difficult, sometimes they are driven by mathematics, which is very precise, that driven by definitions, which are scientific and also very precise. But the whole thing is used by humans and human behavior, which is imprecise. And search engines, actually, although they are mathematical in nature, trying to understand human behavior, which is a really, really difficult to understand at the best of times.
And I think we’re getting now to the stage as as technology gets better and better and better at basically indexing and understanding the context of content we create. We’re getting to the stage where we need to be able to clarify, perhaps, human behavior, where context comes into it. And behind context, of course, is intent.
So we thought, I’m going to go again starting from Dawn this time, where I would ask don’t tell us a little bit about human behavior, context and intent and how these drive now the evolution of search and how they have an impact on the knowledge graph.
Dawn AndersonWell, ultimately, I think much of what we’ve discussed here and the answer regarding intent comes down to what series of debates as well as called the knowledge of the vocabulary problem. So and it kind of sounds a little what, Jason said. But touching on Intent is that lots of people, well, so many people have a different way of saying exactly the same thing that they mean. So search is kind not a solved problem. I saw a real interesting analogy the other day written by somebody who writes a lot about search in the academic / practitioner field on semantics.
And he said search is not a solved problem. It is like trying to understand where somebody is going and what they want to do in New York from seeing them step off the plane at JFK. So I thought that was a really good, good analogy. And really, Ted is about trying to. Yes, with some educated probability determinations based on past data, etc., but also gathering on all the signals as you go along. That’s why this that’s why there’s the whole notion of search results versification in very broad terms, because if you if somebody types a single word in to search, they gave him really no clue as to what their intent is. So somebody types dresses and what do they want. They want black dresses. Do they want long dresses, do they want dresses? So that’s why I know a lot of people get upset about Google keeping people in their search results, but that is user and user feedback. They need that because that’s why universal search is there and that’s why they have search results diversification, a mix of different types of possible intent matches so that people can give their explicit feedback in the things that they click around on, if you like, until finally they achieve a goal and hopefully meet somebody’s intent so i hope that it goes some way towards answering that.
David AmerlandCertainly. And I’ll take the same question to Andrea. And let’s keep in mind, Andrea’s company also deals with artificial intelligence, which from even from a philosophical point of view, has to try and guess human behavior and try to quantify it. So, Andrea please.
Andrea VolpiniSo I think that in the end, yes, information is ultra complex because humans are complex and and when when when you design a system, you always have to take into account that you need, you know, you need to create a feedback loop.
So my my lesson is that whenever you’re you’re talking about whatever type of AI approach you want to use, you always have to keep the human in the loop because that’s what actually creates the value. And and we can do this by creating ontologies. So as I was reading on some of the questions that we have, Larry, that ask about, you know, the definition of ontology and I think the ontology is somehow an answer because it’s it’s a way in which you can, from the philosophical point of view, it’s the study of the things. So it’s the way in which you can study the reality and and the study of whatever exists. But but then within the concept of of technology and ontology is what allows you to create a system that gets the machine engaged with the human and creates a value that that that otherwise would not exist, which is given by the interaction of the human and the machine.
David AmerlandOK, I guess brilliant. And I think since you mentioned ontology and we have a question from the audience, Larry Swanson is asking us to circle back a little bit to ontology and try to get a clearer definition.
Let’s take that to Hamlet. Perhaps you tell us.
Hamlet BatistaYeah, I like to be the the the layman’s translator here. Right.
So that we get from the nerdy, you know, level, you know, of the brilliant image here to, you know, a layman like me. You can translate that. So organization, right? So when you think about ontologies, when you think about how do you organize things when we have so much ambiguity, as we mentioned. Right. How do you know that when that person wrote that, what exactly in that organization
Think about a shelf in a library. Right. What does that fit in if that can be expressed in a lot of different ways? And when we go about search and I will I wrote an article for ACG a couple of weeks ago on talking about the searches that Google has never seen. Google says 15 percent of the searches they see every day are completely new; that they haven’t seen before. So how do you build a system that can answer questions that have never been asked before?
That’s how difficult it is. And I perform an experiment on an article that I published. I’ll post a link; to find, you know, a sliver of those questions from Google search console data. And when I checked review personally from my client, that is a nonprofit, I reviewed the queries that were showing them that had never been shown up in the search console. I found some really interesting trends. And one of the things that I found is, for example, I found searches for businesses that didn’t exist before.
It’s completely new businesses that came in top line. In Google when the search was for a business, Google was able to show the local packs the Google Maps results. The searches were not really good. The result, because there was not a lot of content around it. That’s what I highlighted that as an opportunity. But Google was able to identify that that query, even though he hasn’t seen it before, call for a maps list, right, another query, you know, that was related to news results, call for published articles to be showing up, right.
So even though he hasn’t seen it before, he knew to classify them in the right buckets, the ontology to say these type of searches, demand iimages. Right. These ones demand video. This one require map links. Right? That’s why that is valuable. Being able to organize things into, you know, a hierarchy of information and being able to identify that quickly. Right. That’s s the value of that. It’s a way to organize things in a way the machines can pull in really quickly.
Jason BarnardYeah, I was talking to Nathan Chalmers from Bing, the guy who runs the whole page algorithm, which I mean, Dawn you were talking about click through data and user behavior. And I think that kind of Google, we’re not using click data, we’re not using your behavior. But what I found interesting about that
Dawn AndersonI think that they admit that they use it for feedback and.
Jason BarnardYeah, but what sorry, what struck me about was that they use it from a whole page algorithm in terms of Bing at least I mean, Google have never talked about to me. At any rate, they’re saying we use it extensively to understand what the whole page should actually look like. So I think kind of that that whole idea of using that user data, I mean, there’s no way they can’t use that. And that comes back what Nathan was saying to me, to what Hamlett saying, which is how do they address something they’ve never seen before?
And I think a lot of us as human beings just cannot begin to understand how much data they have and how much they actually have a foot to stand on with the machine learning since in recent years, to be able to double guess what this is going to mean. I mean, outside of anything I can begin to imagine, but I can get the concept. I’m sure one of you can explain it better than I am, that the sheer mass of data and user behavior from previous examples will give them a good foothold to be able to predict what something they have never seen before, is actually aiming to do.
And from Bing’s point of view, just from Bing, when I talk to them, we’re saying 15 percent they feel is vastly overestimated as a percentage. But that’s Bing.
Hamlet BatistaWell, I can tell you that you are I think you’re on the right track. I’m thinking about using historical performance because from the experiment that I perform, I said, look, how can I know in the article that I published that I found this 15 percent of searches. I mean, I found these queries. I haven’t seen them before on the search console data. In my case, my test was four percent. So maybe Bing is right in there and Google might be overestimating it. I found it about four percent of data set, but it’s just a small example and was just a year’s worth of data, not not the whole, you know, worth Google has.
But I said, look, you know, how can I predict whether these searches are going to be popular? And what I did is I look at the historical data and I match them semantically. So I said, OK, even though these phrases haven’t been seen before, I’m going to convert them into embeddings and I’m going to look at the historical keyword phrases and also turn them into embeddings and then do a similarity matched numerically using, you know, imbedding, to find phrases that showed off historically not type the same way, but that they mean the same.
And then I use that to say, OK, this is what this is, because I’m matching it semantically to something that I have seen before.
David AmerlandIf we were to take all of this into practicality and just sort of boil down to a couple of very practical things, and let’s suppose that for argument’s sake, that the audience we have isn’t able to devote the time or hasn’t got the necessary technical skills to do what you just mentioned, Hamlet, to sort of go into a lot of sort of technical SEO stuff.
And what they can do is usually create content. So if we take what you said and boil it down to content creation guideline, if I take one point from each of you, starting with Dawn pleace, and then I’m going to go to Andrea.
Dawn AndersonWell, really, building quickly on what Hamlet said is that whilst 15 percent of queries will never have been seen before, there will be some seasonal aspects that you can – I mean, for me, I would literally say, think about what’s happening throughout the whole of the year for your audience and, you know, you can you can map the content you’re going to need probably for much of the time with a great calendar.
David Amerlandandrea, please.
Andrea VolpiniYes, seasonalities is definitely one aspect. I believe that also it’s super important that you talk with your actual clients or readers every now and then you kind of shape the content model accordingly. That’s the practical things that I would recommend today. Talk with the people that that make the search. If you if you have the chance to do it,
David AmerlandHow would you do that? I mean, from a practical point of view?
David AmerlandA lot of the times you really have the chance to speak with your consumer, with your clients, and they just arrive to you because they made a search. So try ask everyone, you know, how they got to you and what did they search? Because then you understand, you know, the type of mindset that people have and then you can shape the content model that you’re working on.
And rarely people do that because I mean, we think that the Internet is something else, but it’s made of people. And then, you know, maybe you set up a contract and they jumped on your side because they made a query. And what was the query? Why they arrived?
David AmerlandOK, great. Let’s take this to Jason, please
Jason BarnardFor content writing. My number one top go-to thing is disambiguate. Take a big step back every time you write something, take the time to actually think about it and rewrite it to the point at which it isn’t ambiguous. Or you can make it as least ambiguous as possible, at least for Google. But don’t ever forget, as Andrea said, the web is made up of people are actually coming to your website.
So make sure that you keep it sexy, attractive for them. And I’m very much a believer. The simplicity in text or clarity in text. Disambiguation within your copy the machine, doesn’t mean that it has to be boring and dull and uninteresting for the human being. It’s a question of writing skills and that’s what I love is great content writers are back
David AmerlandAnd Hamlet and for, you know, for the audience. I’d like to say that Hamlet’s also shared a link to an article she wrote on search engine journal.
Yes, you can click on the link and scroll about the above the middle where I have some search features. My simple recommendation is that we get used to be using the keyword tools. And we spent a lot of times; I’m actually older, so a lot of people used to spend a time to do the search and look up what is showing on in the search. So something as simple as that. Just take that query, type it into Google and if your audience is in a different location. Right. You can use tricks with the developer tools to set out locations. Right. So you can pretend to be in that location, type that search even set your browser in that language. I was speaking to I was given a Spanish presentation and Spanish, my first language. And but I have all my stuff set in English. So I set up everything in Spanish and I learn a lot just by typing, you know, and put in the interface in the language. So my user typed in the query and seen what’s showing up, what Google thinks this is about, what people think this is it because it’s going to give you idea. So what it is that they’re searching for and also what the competitors are offering and whether they’re getting it right or wrong, if they’re not getting it right, it’s an opportunity for you to get it right
David AmerlandExcellent. And sort of summarizing all this, know your audience, create your content and please test some of the search queries before you create your content and finalize it. I love this. I think you brought in very, very nice pointed sort of bullet points to this. OK, so on the basis of this now let’s ask a two prong question. What are the skills that SEO professionals need to have today? And as you answer that, hopefully we’ll be able to see some of the skills, perhaps, that non-SEO professionals who are on the Web try to do their own SEO, try to promote their business and sort of learn a little bit from this and see what they can do. And let’s start this one with Jason, please.
Jason BarnardI think the principal skill has to be communication, whether it’s writing, making videos. You need to communicate with your audience. You need to communicate with the machine. In the case of SEO in Google and Bing make sure that Google; I wrote an article in Search Engine Journal just like Hamlet, understanding credibility and deliverability. And and for me, that’s the crux of everything we’re doing. We need to make sure that Google has understood who we are, what we offer, who our audience is.
We need to make sure that once it’s understood that we have an offer that’s appropriate for the intent of its user, that we’re the most credible solution for Google to present as a recommendation for its users, because that’s what it’s doing. It’s recommending a result or an answer or a solution.
And third, we make sure we have it in the format that’s deliverable. I think it was Dawn who mentioned earlier on, the rich results, the universal results. If the videos repairing; it a great idea to have video, sometimes video is the best format. Sometimes writing is the best format. Sometimes podcasts are the best format.
But if you can get those three pillars together, make sure Google’s understood who you are, what you offer who you audience is, that you’re more credible as a solution, more reliable as a solution than your competition, and that you have the content in the deliverable format either on the search, that Google can deliver it, or that Google believes you can deliver it, which will be all the technical SEO stuff on your site, so to speak and so forth. Then you’re going to win the game,
David AmerlandRight. OK, Dawn, to you, please.
Dawn AndersonWell, first and foremost, I would say that while I’m I’m doing a lot of research at the moment into the notion of Zipf distributions and Zif Law. And I think that the principle of that is ultimately there will be a small amount of things that are massively important and many, many, many, many long-tail things that are not all that not important. But there are many of them, a little bit like the very long tail queries, 15 percent, four percent, whatever that was. The key is to make sure you are really, really clear on what matters the most to your audience, focused very much on that, ans as Jason has alluded to, focus on the right medium for your audience. For instance, if you are working with a fashion brand or you’re in the fashion space. Images are probably much more important than words because it’s very image-like. Fitness is video. It’s not always about words per se so.
Consider Zipf’s law in all of the things that you do.Yeah,
David AmerlandExcellent. Brilliant. Andrea, please.
Andrea VolpiniSo data publishing is always my answer to these. So, you know, my obvious answer is published data. So make sure that are you’re dealing not only with the searcher in the form of a human being, but the searcher in the form of a machine that will help the human being get to you. So in order for for the machine to understand you publish as much data in the most possible clean and accessible way so that you can be understood by the searcher and by the agent that helps the searcher get to you.
David AmerlandExcellent. That’s brilliant. And how Hamlet please.
Hamlet BatistaYeah. And I and I agree with Jason that I think the most important skill is communication. So you need to be able to communicate really clearly. And a lot of times one of the problems that I see is that we are SEO’s. We start; even when talking to clients, we talk about a bunch of different, you know, Page authority and stuff, you know, terms that clients don’t understand or they can connect directly with business value.
And I think on the clarity of communication in everything; what you’re doing and also in the writing of the content, I will tell you a short story about a prospect I just talked last week. We have all this content and he’s getting some performance, but he doesn’t feel, you know, he feels like something is missing. And it’s because he just went to a website. He says, I went to this; he is a manufacturer and he went to the website about guitars and he got excited about the content and he saw the authors and the videos and stuff like that.
And he says I don’t feel that way when I go to my website with the content that I’m being told to put in without even; and you look at it, not even authors associated with it. Right. And I say you have to communicate clearly and you have to deliver. Right. Make sure that you’re delivering to your clients. Make sure that when people search that when they click, they’re going to be satisfied because that’s going to be the next step.
A lot of us CEO, we say, oh, I just got the ranking right and the ranking is there. But nobody’s clicking to begin with because I’m not even paying attention to the search snippet, whether search snippet says, right, people have to click and the search never have to be compelling. Now they get to the page. They pay us to deliver. What you promised in the search snippet has to deliver. And I think everything else, the technical side, which I spent a lot of time, can be learned really easy. But those are the foundational things that you need to make sure that you value that you have.
David AmerlandExcellent. Let’s bring them. So basically, in one word, summarizing everything. And again, you gave a fantastic sort of a layered approach to all this communication. Right. And communication. We know from the human level it’s really difficult when we factor in machine and machine interface that acts as a translator becomes even harder. So clarity and clarification are really, really important co.
Dawn AndersonAnd consistency.
David AmerlandAnd consistency. Thank you. that is that is very, very, very true and very good to add this. So we see that in search, that increasingly machine learning is being used. The introduction of artificial intelligence in a very sort of constrained sort of definition is being employed. So search is accelerating in terms of how it changes, how it evolves in the amount of indexing it does. How does this all affect the creation of knowledge graphs, the impact on knowledge graphs and what is new in that arena? So let’s take this from let’s start with Andrea on this case please.
Andrea VolpiniSo a lot of a lot of the new things are related with this, we discussed, you know, with the new language models that are capable of synthesizing information. And so we see that the knowledge graph, it’s evolving very fast because it grabs now unstructured information and can compile it into triples by just using, you know, something like, you know, we discuss T-5 and other, you know, the evolution of that; of these language models that can compress information from being, you know, unstructured and then structure it inside the knowledge graph. And that’s that’s what we see, you know, with also with the experiment that we do with Jason, that a lot more information that’s coming up from the open web, whether before we we’ve seen, you know, a higher importance from the structure sources like the Wikipedia and of course, the crunch base is the LinkedIn.
These are still relevant, but more information is being sourced by the open web. And this is an opportunity for everyone that it’s writing, you know, to be consistent and throughout the machine, understand what can be triplified, what are the information that really matters out there. And then, of course, as we see it, as a data publisher, it’s important that we get the information accessible and interoperable with other data sets so that, you know, graphs can evolve.
David AmerlandExcellent. And I think, again, I think the very fact that you mentioned consistency here is important and we’ll circle back to this because it’s a very sort of applicable sort of practice of the very sort of universal level. So let’s go to Dawn, please. Again, checking to see how machine learning and A.I. is influencing search and affecting the knowledge graphs.
David AmerlandI think was well, ultimately what’s happened is, over time, things will just get increasingly, increasingly more and more accurate, like we talk about consistency there. We do see a lot of inconsistency still. And I think interestingly, the other week when Google had their event was the search on events. They talked about the fact that Google data sets and they have over now, there is there is no shortage anymore, if you like, of data of that. If that’s almost like data lakes. And in a way, that level of data makes things confusing and easier to be ambiguous in other ways, because you have all these different bodies now creating more and more data banks. And we often see issues with, for instance, where search engines pull in a feature snippet. An image from one place. We all know about the issue last year, where how many legs does the horse have ? Lot’s of different answers were given about the number of legs?
David AmerlandI remember. Yes.
Dawn AndersonAnd that’s because obviously there’s so many there’s lots of different data out there, but it’s being pulled from different places. So I think over time, one of the big focuses will probably be around this whole, like, alignment of data. Alignment. So accuracy will build. And, you know, natural language is complicated. But if you hope that over time it gets more or less ambiguous.
David AmerlandCan I ask as a follow up on this just for you, you mentioned consistency earlier and Andrea also echoed that. How would that apply on a practical level, what we mean by consistency when it comes to content creation or data?
Dawn AndersonSo really, it goes back to the whole the whole NAPS is a really good example, the name, address, phone number, type thing and, you know, make each other the same data for everything, if you like, online. I often say when somebody changes a brand name, so somebody rebrands and you end up with some results, some data back to the happy old name. Some have the new name. And you end up with this, like, confusion where site seems to run one minute for something and then it gets switched out and it ranks for the other name or it runs for a different address. So I think it’s just really about practically just keeping it every mention of the entities that are out there, just consistency and in it all naming conventions. Would you agree with Jason?
Jason BarnardYeah, you just inspired something. I think you saw me raising my little hand kind of halfway up like that. But even without changing your brand name, there’s an awful problem with consistency that you’ve got different employees who place information in different places, in different manners, especially over time when when your employees have changed over time. And I get all my clients to do an audit, to go around the web and find all the information about them, just see how inconsistent it all is. And it’s usually a bit of a shock. And then so I go around and correct all.
And the thing about consistency is, in my experience, part of it is I’ve created an event called Kalicube Tuesdays and it’s an event every week that I live stream to YouTube and I can push those into the knowledge graph in five minutes or less. They basically go straight into the knowledge graph, which is wonderful and terribly, terribly cool. But then I place it on different platforms and I spent a lot of time making sure that every single time I mentioned\\ the event that the title was exactly the same, the description was exactly the same, and Google was still creating duplicates, however careful I was.
The machine is obviously kind of still learning. So it’s obviously still learning process. But one thing is, however consistent you are, the machine is still having a certain number of problems. But that doesn’t mean to say give up on being consistent because it’s the only way we’re going to get anywhere near getting the machine to understand and disambiguate and make things clear for it. And in terms of machine learning, Frederich Debue from Bing talked to me about kind of how he perceives; I like the way he was saying is what we’re doing or what they are doing, is they have they give the machine a big chunk of data that they’ve checked and they sorted and they made sure that it’s all categorized correctly.
And then they say, OK, and this data goes to this goal and they give the machine a goal of great results or. goal of whatever it might be and then take the data that the machine has then produced from trying to achieve that goal and feeding it back in, having tagged it either good or bad. So they’re encouraging the machine with a good examples and telling it, giving it corrective data with saying this is incorrect, please. Correct.
And generally going through that process of getting results
But I think kind of like the idea of machine learning. If you look at it that way and say that, in fact, that the engineers were building this stuff, are actually just looking at the goal and then trying; they’re setting the metrics by which they’re judging the machine and then feeding the data back in to correct the machine as and when it makes mistakes and to encourage it when it’s doing it really well.
And that, for me, just machine learning without understanding the details of it, much easier to grasp. And what’s easy for me as a professional in the marketing industry and somebody was saying earlier on, I think Hamlet, basically what for you, what you described earlier on was communicating clearly with our audience is marketing
David AmerlandAwesome. And with that, we go straight to Hamlet, please. Again, we’re talking about machine learning AI, how they affect.
Hamlet BatistaI’d like to quick, then let me share my face like. Yes, I posted a link to the news article. Where I show you how to build a knowledge graph from scratch and show you quickly this link. So you can open it. This is actual code, but you just have to hit the play buttons.
And you get this graph, right, and what I did is I took the XML sitemap from search engine journal, all the articles I filter the articles for this year. And then I used the headlines from the URLs then turn the URL’s; this logs from the URLs into headlines and then from the headlines, I pulled the entities and the relationships and I build the knowledge graph from this. And this is that relationships, and I’ll walk you quickly through what it does.
But here you just hit play at the end. And for example, this is a relationship launches, so SEJ publishes articles, and quickly I’m able to learn one of these articles about, look how powerful this is. And this is fully automatic. I didn’t have to type anything manually. I choose the sitemap. And for the relationship launches, I can see Google launches high demand fields, blog SEO. Microsoft marketing podcasts. Yelp launches local crowd. SEO coronavirus and coronavirus queries. So look how powerful this is, right, that relationship and I can do this one is what we’re talking.
We’re going to. Right. Look how beautiful this is. I can improve. Right. And you have the code to do this, look how cool this is, right? So you don’t have there is code here, but you don’t have to; you just have to hit play. And I go through the steps here, I wanted to highlight a couple of things here, right? Which is AI; what is machine learning, how machine learning enabled building this raft and there is the library and this is using Python Code; Python, this is a simple programming language. I’ll share a link for SEO’s that has an introduction for Python to understand the basics.
But basically. This is where machine learning, makes it a lot simpler. To take on a structured text. And do NER non-entity recognition. And this is not optimal, but it gets at about 60, 70 percent, which is better than trying to do it manually, but for the purposes of what I show you here is very effective. It basically takes text this is from the headlines and labels the text with entities.
These are numbers. This is supposed to be a person which is not correct. Organization, product, date. And I have this here. This is on this one. Look at that, they have the code here and using machine learning. I’m able to do NER, and then also I can extract; then I use asking tactical, sort of natural language to find the edges, to find the relationships between the subjects and objects, and with this, you see this in practical terms, that I can turn that into something practical that you can use.
All right, this is a crazy, scary KG with everything, that’s what I did, a filter here might be a specific relationship. So hopefully you find that useful. So I wanted to show you with an example, this is the post of the article in SEK you have it here [https://www.searchenginejournal.com/natural-language-processing-python-seo/377051/]. See if you have any questions around it,
David AmerlandOK that’s brilliant.
Hamlet BatistaYes. Larry is asking that Python is important for technical CEOs, you know. Yeah, for technical SEO, I would say. You know, mastering the search console tools for Google & Bing, they have you know, they have a series of videos in academies that you can take to get a good understanding of the of the fundamentals of how the search engines scroll index, rank. Understanding those concepts and understanding the structure of data is also very important to understand. A lot of this that we’re talking about, structured data can be seen as an ontology, of how Google, how schema J-SON standards that can be used to categorise, organize information. And a number of those elements can be can power rich features in the search results.
David AmerlandRight. OK, so let’s now, as we begin to sort of get into past the halfway mark of what we’re doing, let’s ask for a couple of practical takeaways from all this. I mean, if we’re to advise the audience of something very specific they should be looking out for as they’re creating content. So let’s start with Jason.
Jason BarnardYou’re looking for practical advice about content for knowledge graphs. Oh, beyond keeping things simple, being consistent, not being ambiguous. I mean, one thing I would like to come back to and Andrea mentioned, and it was Dawn who mentioned to me for the first time semantic triples. And when she said the word to me, I thought, oh, that’s that’s a bit too complicated. I don’t really understand. And it turns out it’s just subject-verb-object. I mean, it’s really simple. The subject is related to the object with the verb.
And that idea of triples, if you look at it that way, as a normal human being who hasn’t been looking into all this for years and years and years and years, doesn’t read academic papers, subject-verb-object, try to keep your content writing with the subject verb object as close together as possible. However good BERTor T-5 gets, the closer this is together, the clearer it, is not only for the machines but for your users.
David AmerlandExcellent I think I three. That’s actually very practical and applicable thinking.
Hamlet BatistaYeah, it was thanks to Dawn because she confused me and then I had to think about.
David AmerlandI think I think you made an important point here, inadvertently perhaps, that essentially our own confusion when it arises becomes a critical point for clarification, which then perhaps we can presume with our audience; something that is pretty good.
So it’s not sorry that you should be saying Dawn. It’s it’s my pleasure.
Dawn AndersonThank you. Thank you. Okay, let’s take this to Andrea, please.
Andrea VolpiniSo my practical advice, given also what we discuss, it’s data syndication. So I first say data publishing. Now I say data syndication. So the more you can make this data available. Jason made an example before about the podcast. You published the podcast and you are creating data from your side in the form of structured data. Now, this data is foundational for a lot of other platforms. So whether you’re publishing on your site or you are creating a feed for Amazon, or if you think about e-commerce, you know the importance of getting your product metadata in the form of structured data, but also in the form of a merchant feed.
So syndicating data, it’s really essential, it’s practical, and it kind of helps in disambiguating and helps in creating consistency that the machine required for for promoting.
David AmerlandThat’s really good. Dawn, could you please?
Dawn AndersonWell, one thing I would say is that Hamlet gave a really great example there of how actually just linked pages and linked URLs together can create a form of knowledge graph. So I think people should really be very, very mindful of the way that they link their pages internally. Yeah, and it should be paramount in people’s minds. And I think, as I said, I am doing a lot of research at the moment into Zipf distribution. Matters really in the world of SEO and it matters a lot like seams. And how to read things like importance and the connection of important things together. Make sure that you are really strong on those relationships that you connect together; within a website, within the sections of a website. How do you link things together as well can help a lot with disambiguation. Be mindful of the terminology you use? For instance, one of the sectors that I’ve got a lot of experience is in local search, local services search, and there’s often a lot of ambiguity in the difference between a heating engineer and a plumber, because both of them deal with leaking radiators.
So, if you can, try and keep a very clear separation around terminology that you use in different sections where you detect there could be two ways of saying the same thing or calling something the same name like engineer, as we know, is a very, very ambiguous word. Particularly it feels like, you know, computing, for instance, an engineer is really a developer, but everybody likes to be called an engineer because it’s smarter. An architect is really, historically in Google’s mind, somebody who designs houses, but everybody in the IT space is basically an architect.
So these things are very confusing given for many, many years an architect; and Wikipedia calls it, somebody who does a house. So be careful how you connect things in terms of internal linking structures. Utilize importance, with all of these things as well, signify when something is more important than something else. For instance, this many Swinton’s in the UK, but with some Swenson’s much more popular than many longtail Swinton’s. So utilize internal linking to emphasize which is the most important of the Swinton’s. Utilize;
Be careful about ambiguity in words that fit together; Heating Engineer, plumber boiler, et cetera, and try to add a lot of these supporting terms to show search engines “well this is actually a computervs a house architect”
David AmerlandCool, and Hamlett to your place.
Hamlet BatistaYeah, and I want to expand on what Andrea said about syndication and I want to talk about content formats and repurposing content. So a lot of time just content writers where in this “let me publish four articles a day or X number of articles” and more of a quantity game. And we said, look, why don’t you step back? And we said instead of pushing this “me too” type of articles that just regurgitate something else that somebody said spent a little bit more time, write something original.
And because you put so much effort on that right. Publish first in video format, it could be a webinar or an interview. Now, that video. Right. You can repurpose the audio of that into a podcast, That Podcast. You can add structured data to take advantage of that stuff. Now that video, that podcast, that audio, you can turn it into an article. You can transcribe that content on that article. Right. The video content. You can also put slides in a presentation. Right. So there’s all these different ways that you can repurpose the same information and then get distribution, in multiple social networks, multiple knowledge graphs, that you can accomplish both things just by repurposing higher quality content. In different formats, instead of the traditional approach of just churning and burning content,
David Amerlandthat’s an important point. Thank you so much for making this. And essentially, here’s a question that arises out of this. And I think, you know what you said, Hamlet is really important. And it begs the question and the question is, how can the average business person who’s got a small team or maybe not a team at all, you know, maybe just him and his dog working from a Web page, develop that kind of skill set, which you just mentioned, which is really important now.
Hamlet BatistaYeah. And I’ll tell you, because I’m a small business, right. And we do this. I’m telling you is how we approach it. And it is, because you have the higher value content in the experts. So in my business, I’m an expert. I have few people who are experts in my business. But they don’t need to be the people that are writing and producing the content. You can hire journalists, interns that they are the ones who interview the experts.
So the experts will block, you know, half an hour, an hour, a month. Right. And you might have four experts don’t need to come to for one per week. And the journalists; can be interns, right? Or recent grads. Can research interesting questions and ask them, you know, webinar format. Write to the experts on any format you decide internally, extract that expertise from the experts and they do the write up because they’re journalists that’s what they’re trying to do.
David AmerlandAnd there is a budget involved here. Right. There’s obviously a budget involved, however small. And there’s also a certain level of expertise from you, in guiding what you want to achieve.
Dawn AndersonYou know, I completely agree with Hamlet in that but I think this thing there is a massive… And I hope you don’t mind me to interrupting…. That’s actually a nightmare for quite a while. There is almost like a stigma in the SEO world where if somebody doesn’t completely spend all of their time writing their own article, that they should be cast out of the SEO world. But we never say that. I wrote my piece on passage indexing. That took me like over a week.
That’s why in reality, I. I couldn’t do one of those every week because I need to earn a living. The point is, I think that there is a case to be had for different types of content to delegate parts of that to people. So they go off and do a lot of the research, gather the data. etxc As long as the full rights of all, the full cooperation at the end or the full endorsement of the content is from an expert, if that makes sense.
David AmerlandIt does, absolutely. And I think and again, I think you made an important, very important point here, but simplifying it even more for an audience that is not really part of the SEO community, they are business people in the Web trying to earn a living. You understand the need to do some kind of SEO related activities; they don’t have that kind of skill set; they don’t have the budget, perhaps.
So now what? Are they completely stuck? This is a question.
Andrea VolpiniNo, I think that I mean, if I can provide the answer, I think the real answer is that you need to experiment whatever type of business you’re doing. You need to experiment, especially nowadays where you have this evolution of machines. You know, if you want to do something, yeah, you need to build an organization, but above all, you really need to love to experiment on new things because that’s that’s what, you know, at least, keeps me alive.
And that’s what I, I can’t imagine keeps Hamlet, you know, writing code, day and night. And then Jason, you know, experimenting with a new way of of shaping his family knowledge graph. And and I think that’s the real secret of the people that are around in this panel is that we all love what we do and we put a lot of effort that goes way beyond, you know, the expected outcome. It’s a lot of enthusiasm and in experimenting new stuff.
Hamlet BatistaLet me and let me add another another point. Right? Another example. Trying to take on other practical uses. I also like to recommend clients to leverage stuff that they already have. So, instead of saying, OK, I don’t have the time or the budget to allocate to stuff for SEO. Look, you have customer success people probably or customer service people or you have salespeople, and they have knowledge that is valuable, that they probably only sharing that internally with emails or chats and stuff like that.
Something simple that you want to start, get started, create FAQ’s. Right. Just give a task to your people that are frontline with your customers, to populate a FNQ with an internal wiki and have them process them; if you don’t have somebody to publish it on your website, create an automation with with zapier or something that would just be just updated internally. And one day updated, snap! gets published on the Web site.
Maybe they don’t have the technical expertise, but they could have somebody that once a week, once a month from the internal wiki of the FAQ’s that people internally, because that FAQ is valuable for them, to service their customers, not just for SEO. They need it to be able to answer questions to customers. They publish it on the website. And now the next time someone has to answer the same question instead of typing it again: “Oh, send them the link to the page on the website”! That is going to start building value. And they’re actually doing 2 things that are valuable for SEO: they’re writing content and they’re promoting it because they’re sending the links to the customers.
David AmerlandAwesome! And I think that’s a really, really important point and creating a “frequently asked questions” page, even gathering together data from your customer service people based on what other you know, while there were other complaints that we’ve had, what are the biggest complaints? Look, on Twitter, you’ll see a lot of brands get massive, massive amounts of people complaining on social media, but they do nothing about it, whereas they could actually build a knowledge center or help supplement or something.
And they could literally say on Twitter they could have a separate support Twitter account that actually deals with all the all the complaints that would keep all the branches off their main social media channels.
but anyway, the point is in content as Hamlet very well said, content takes many, many forms and it’s not just always the one thing that we expect it to be.
Jason BarnardCan I add something, David, really quick?
And it now goes even further because in knowledge panels, you’re now getting questions about the brands that are appearing and brands are not looking after it. But if you have a great .FAQ section and you’re answering these questions that people are truly answering, Google will put you in control of your own people.
And what I’m now calling entity questions, we could call them perhaps, in the knowledge panel ;such as price. What your mission statement? Where do you apply for a job in a company? Those are appearing in knowledge panels. And
FAQ; a good FAQ that covers most of the common questions will dominate.
David AmerlandExcellent. OK, so we’ll have a couple of questions from the stream. One is from Larry Swanson. He’s asking Hamlet in particular, is there any risk of duplicate content issues with repurposing? Let’s keep in mind, the duplicate content as an issue has been one of the most frequently asked questions in the SEO world since the discovery of search engines…
Jason BarnardYeah, I want to clarify that there is this myth that duplicate content “oh, you’re going to get penalized for the duplicated content” which used to be the case really back in the day when people were just doing it on purpose.
They were repurposing you all the literraly duplicating information or copying it or stripping sites. There’s still some of that. But for the most part, when you’re talking about duplicate content without, you know, that nefarious approach, you’re not you haven’t seen a penalty on duplicate content in many years. So what happens with this content is that you get your content diluted. So you have.
Dawn AndersonYeah, competing with yourself,
Jason BarnardYeah, when you canonicalize your content, your rankings get diluted and your performance is not optimal.
Hamlet BatistaYeah, I was just going to say, the thing about the pandora update’s, the panda penalties, it’s all about what’s it called. Oh, I forgot the word. Now, plagiarism. Plagiarisms the problem, it isn’t the fact that the contents are exactly the same, it’s that people are plagiarizing other people by exploiting their contents. And the other point is, if you’ve got a duplicate, one of which is a video, neither of which is text, it’s not a duplicate in the sense that the video will sometimes be more appropriate. The text will sometimes be more appropriate depending on the platform. And the kind of content the person likes to consume.
Dawn AndersonThe issue really is not duplication is the use of shingles. There’s a lot of stitching, there’s a lot of content now that is dynamically generated, just kind of trying to …
Hamlet BatistaYeah, but that’s deliberate spam.Right?
Dawn AndersonSo that’s Web spam. That’s my point.
Hamlet BatistaWe’re talking about what Jason said. Right. Different people want to consume the content in different ways. I don’t necessarily want to watch a video. I don’t want to necessarily want to listen to it. Right. You’re providing more convenience. You’re broadening your audience because not everybody wants to consume the same content in exactly the same way. So you’re actually doing a good service to your audience by repurposing your content in different formats, including the content to the person that wants the presentation. “Here it is, the webinar, the podcast, the reading transcript”. You’re doing a favor to them. So that’s nice.
Jason BarnardAnd also breaking a big piece of content down into little pieces of content isn’t necessarily a bad idea, in the sense that the whole thing is not necessarily interesting to somebody. Just one small chunk of it might be interesting. And then once again, as Hamlet says, you’re doing your audience a favor by saving the time you want to go through the whole lot to find that one important point that they were interested in.
David AmerlandOK, we have one more question from the audience. This is Jennifer, and she’s asking Hamlet again. How do you feel about using a frequently asked questions pages in a nested schema, in addition to a dedicated FAQ page?
Hamlet BatistaYes, and I will say that I haven’t run that experiment, but it’s fit to experiment when you say nested, that I assume that you’re talking about inside another another element, like a product page or Web page. Right? So you just have to test it, right.
Andrea VolpiniWell, it is it is kind of a problem, if I can say, because so the problem with nesting content inside different types is that, of course, an FAQ is a specific type. So if you if you match a FAQ for instance, with a software app, you you basically are adding value to the original content type. But if you do that, for instance, for the news article, so if you start like kind of diluting your news article with a lot of FAQ what will happen or what I saw happening is that Google will treat the content no longer as news.
So you will lose the visibility as soon as you start combining FAQ mockup within the context of news article, you will reduce the visibility across the news face. surface. So the Google News and the Google Discover because the content is perceived as an FAQ where as a hybrid, which works perfectly for Evergreen or for things like software app or local business, but it doesn’t work well with news type of content.
Hamlet BatistaYeah. And that’s what that emphasizes because maybe, you know, your site is a product site. Right. So that’s where you have to test it. Right. See how that affects and do it on small scale in a number of pages of your site. Right. Thus, if you need help with the testing, that’s what my company does on the edge. But basically, that’s the idea. So try it and see it how I affects, taking into consideration experiments that Andrea has already run.
Dawn AndersonAndrea, are you saying that ultimately if somebody uses on a product page FAQ schema that it could be problematic?
Andrea VolpiniNo. On a product page, as I always have seen a good result, because it’s usually, you know, an FAQ will expand the number of interns that the page can capture. But I have a lot of experience in the news and media publishing. And so when FAQ start that to appear on the SERP, of course, all of my clients start to use it in a very proactive form. And then we saw that as the number of FAQ increased, then we decrease the traffic coming from the news surface.
So again, Google News and Google Discover and the top stories, because the content was no longer recognized as newsworthy as soon as it became kind of more related to their FAQ part.
Dawn AndersonSo the intent is completely different.
Andrea VolpiniIt’s a slightly different; it’s good for a product, though, because, I mean, you may have a lot of questions for your product and you can use FAQ to kind of respond to these questions. But for a news article, it’s not a good; it’s not a good approach. So right now, my suggestion is to plug `in FAQ after the first two weeks or three weeks of the news, as you know, as you consolidate the content with other content pieces that you might have, you can then start adding FAQ.
David AmerlandQ Okay, I think it’s brilliant and i think the point Andrea is making is that search is becoming an increasingly good at recognizing the nature of page.
So instead of just look at the content on the page, it’s nature which then determines how it treats that page. This is something we need to keep in mind
Andrea VolpiniIt force you to think about content model. Right. So you think, OK, what am I building? Because I also like FAQ is good because, as Hamlet say, we can repurpose it across different, you know, application. And for instance, we we developed now a chatbot for our users and we feed it with FAQ you find on the website. So it is all good because it’s all about repurposing.
But you have to get the right intent.
Hamlet BatistaAnother thing to know is that search engines are always experimenting and this might be the case now during the experiment Andrea said, that doesn’t mean that it’s going to be the same tomorrow or.
Andrea VolpiniYeah, totally.
Hamlet BatistaSo that’s why, you know, testing things is a good thing to make sure they’re beneficial for you.
Andrea VolpiniSo how to how to. I see. that Jeannie has another question about “how to” page.
I think how two are less subtractive, in terms I mean, for my personal experience in terms of, you know, enticing clicks, because they are, again, even more specific content type.
Hamlet BatistaYeah. They provide all the information or from. Right.
David AmerlandYes, and we talked now, funnily enough, again, we went from technical stuff to the human behavior, but the technical stuff and back to human behavior, talking about intent now and how content is perceived. Which leads me to another question about practicalities. If you’re creating content on your website, and I think Andrea made a very important point when he mentioned about the content model. So how do you go about deciding the content model?
Question one and the follow up question, and that is, as you create content, how do you go about deciding how to create a taxonomy of that content? So it actually begins to make deeper sense. So let’s start off with Andrea on this one, please. .
Andrea VolpiniUm, so I can quickly share the screen and see if you can see my screen. And I will be quick, but I show you a little bit of an anticipation of tomorrow. So this is the embeddings of a knowledge graph that I have peeled out of George Anadiotis’ latest article on ZDnet. OK, so this this is a very fast and simple way for me to start looking at the way in which George receive concepts so I can see that knowledge, it’s, you know, within the context of the latest article that George has written on ZDnet, it’s highly related to artificial intelligence and to Gary Marcus that it’s also participating at this event. And I can also use this, you know, embeddings, for instance, to start organizing the things that George writes about and unbias George and try to put like, you know, for instance, in the context of, you know, an entity like knowledge, I can see on the right side, you know, the concept that are most likely related to Google and on the left side o,f sorry in this case, on the on the down side, Google and on the upside, things that are related to Microsoft.
And so you see that Microsoft, when George writes about it, it’s more closely related to, let’s say, investment about whether, in regards to Google, of course, there is more reference to the Google cloud platform, the “software as a service” and of course, the deep learning. So you can see that I can start shaping the content using the knowledge graph that I’m creating for the site.
David AmerlandThat’s brilliant. And I think that’s a great question. And how would you go about creating an ontology? So if you start to create content or if you’re guiding the creation content, perhaps you have a clearer idea of how to structure it, what should be a starting point?
Andrea VolpiniSo my starting point is to use the schema.org Vocabulary and the interpretation that that Google has of the vocabulary. So as we said, as Google sees the FAQ content, what is an FAQ for Google? Because that’s the way in which I want to organize the content on my site. So if I have an area for users that contains all the frequently asked questions, then I’m going to use the FAQ mockup and the structure of the FAQ mockup that it’s built in schema, just in the way in which Google interpreted and the same. I would do it with the “how to’ s”. So what is what is a “how to” in the schema or vocabulary and how Google sees the “how to”. And that’s exactly what I want to do. You know, I just want to model my content on what the search engine is expecting from me.
David AmerlandExcellent; and for the audience. I’d say you practice empathy …
Andrea VolpiniIn a way. That’s right.
David AmerlandSo you’ve got to think, how does a search engine …
Andrea VolpiniYou have to. You have to. You can start from the vocabulary, which is kind of, you know, the the golden rule, but then the vocabulary is interpreted. And so now an example. Let’s say this interpretation varies across time. So now FAQ is interpreted in a way, maybe tomorrow is interpreted in a different way. And then, of course, it’s content specific. So in an e-commerce site an FAQ is a value, outside as a complete different value.
David AmerlandExcellent. Anybody else wants to add something from the panel.
Jason BarnardWell, coming on from what Andrea was saying, it actually kind of just brings to mind the experiment that Andrea mentioned earlier on, is all around the podcast. And what I realized very quickly when we sat down, say, how can we present this to Google? And I said, let’s look at this schema markup. Let’s get that upon the table, I realized that I was presenting it very badly.
It wasn’t structured, it wasn’t clear. And I actually got an illustration done and I was going to show it on screen. Sharing my screen. And it’s this illustration, basically, it was to set out exactly what it is we’re doing, and this is what Andrea just mentioned about the the schema markup; using the schema to define how it is we are presenting it. We have a podcast episode right there in the middle, “the lowdown on EAT strategy”. We have a contributor. We mentioned; we talk about expertise, authority and trust. We have a production company. We have a person who’s the director and it’s part of the Jason Barnard series. And all of that, when you look at that, you suddenly realize what are the attributes and the properties of this specific episode.
And it’s more complex than I had initially thought. And I had organized it badly. And it is now very well organized. We’ll see where the experiment goes with this. But the idea is to push this entire podcast series and all the people associated with it into Google’s knowledge graph.
Hamlet Batistayeah, i want to add something as well, so. See if I can share my screen.
So this is another. And I’m trying to look at the content production from the perspective of gaps. So this is about looking in the SERPs, right, what people are searching and using the SERP features in this case, I use a SemRush API. In this presentation the process is described how I came to this. But basically for this client, I look at the structured data in the pages of why where the content that were marked up in the structure data and in the SERPs for the keywords that page is ranking for and look at what people are look what is the demant. And what this is doing is looking at the gap. Right?
And you see here, in yellow, coloring the opportunity, which means that there are SERPs for that format, but that page doesn’t contain the format. And that’s another way to look at your content development, because in this case, for this client, when I show them, look, you know, you have a lot of demand for media content, but you’re not satisfying it. These are product pages. This is an e-commerce Web site. And the biggest opportunities, as you can see here, are video.
And then you see images, high quality images as well, and FAQ’s. So with this process, I was able to determine, ahead of time, that when I put together when they’d write the content, that there’s already demand for it. So that the success chances are higher. Right, want to share that, posted a link, so you have it; it’s slide 39 on that presentation [https://www.slideshare.net/hamletbatista/scaling-keyword-research-to-find-content-gaps].
David AmerlandThat’s brilliant. And on the question of content and content creation, we have a question from Larry Swanson in the stream. And he is addressing this to Andrea and is asking, It seems like designing content for Schema.org might not always be the best way to address user intent. How do we balance the needs of Google and schema with the needs of the user who comes to a website? Andrea?
Andrea VolpiniYeah, I mean, I always start from from the business perspective. So if the business perspective is through, you know, create a positive ROI, from Google as a channel, then of course we want to look at the way in which, you know, schema is interpreted by Google; that varies over time. So schema it’s way too generic to to to be, you know, the sole structure for your content model.
You need to to add your own business needs and understanding of the business personas. And then you need to understand how Google is interpreting a specific element. Just to give you an example, we are now starting to run an experiment with a team in Google that it’s trying to test the new mockup for video objects outside of YouTube. And so it’s important as content owners, we try to understand the way in which Google will render these mockups because because we want to understand if there is a point in taking advantage or not.
We also might decide that we want to draw a line and say, hey, this is only premium data and I’m not going to share it with Google because it’s not going to benefit. And we can see, for instance, in the case of “how to” that, we might not have a real advantage in building too many “how to’ s” because we might get less clicks. Rarely is the case, but we have to interpret the way in which, you know, the platform that our user is using; it’s interpreting the vocabulary. And so in the case of the video object, at the moment, you know, you are way much more limited when you publish video outside of YouTube than when you’re using YouTube. And so this experiment that we’re now doing is trying to kind of fill this gap and say, hey, what can we do for a content owner that that doesn’t use YouTube because that’s proprietary content. And how can we kind of make this this content more accessible to search?
David AmerlandExcellent. That’s brilliant. Now we’re getting near the close of this. So essentially, I’d like to open up a question to all of you and we’ll go in turn. We’ve talked about intent and how we basically trying to understand it, tried to mine it. If you had to give a tip to content creators on how they would best understand user intent, what would that tip actually be? And let’s begin with Jason, please.
Jason BarnardWell, I’m curious about the idea of kind of trying to double-guess Google on the idea of user content. Theoretically, if you talk to the salespeople, if you talk to the marketing people, if you talk to the after sales people and the client support people, you should really understand your audience and what they’re looking for.
What are their problems, what are their pain points? Where can you actually bring a solution that is viable for them? The intent is actually, as far as I’m concerned, at least within your business. And part of my approach would be to say understand your own business, which Andrea was saying, understand your business and push it out there. And you want to be there when your customer needs you or your potential customer needs you. And I tend not to over-obsess with how Google deals with content.
David AmerlandOkay. Hamlett, please.
Hamlet BatistaYeah. And I want to I like that recommendation from Jason. I’d like to expand the same thing that I mentioned. Right. Get out of the tools. Right? So you and your competitors are just obsessed with the tools and looking on all the information?
Spend time gathering data inside knowledge; from within. You know, when you look at the bigger companies that you’re competing with, they don’t have the time or the scale to do one-on-ones, you know, internally and gather data inside knowledge that you only get a one-on-one interactions. Right. Get customer to give an incentive for your customers to get to you in a call every month, every week, so that you learn about what they’re caring about or other things that they really, really, really need and see if you can actually watch them.
There are tools that you can use. To play sessions, Microsoft launched Microsoft Clarity that you can replay sessions of users on the Web. You can learn from them when they type in your internal search, see; they came from from Google. They came from whatever on your website. They see the internal search; just by watching them and seeing what they’re doing. You can extrapolate what is that they’re looking for and whether you are doing a good job of delivering, which is what we were talking before. So, intent; internal knowledge, watching your customers, talking to you or to your phone line reps, employees that are dealing with it and gathering that information, that’s going to give you far more information and more valuable than what you’re going to get from just being stuck in the tools. I’m not saying that you don’t know the tools, but just spend time out of them as well.
David AmerlandOK, that’s a good tip. Dawn?
Dawn AndersonI would say literally the SERP’s, because search engines have enough data to have more or less figured out, to a large extent, taken into consideration search results diversification. But even with search results diversification, there is usually; when you start to look at the results in return and as a response to a query, you can work out the different intents in descending order. And given that, that’s probably based on probability from past data and a prediction for future intent, that I would say the SERP’s probably a really great place to spend a lot of your time. And I spend as much of my time in the SERP’S; looking at search results as I do in the tools. And it’s free. I would give that recommendation to anybody. See what search engines are returning first and foremost.
David AmerlandThat’s a very good tip. And Andrea, last one please?
Andrea VolpiniI think we have to be very critical of our own data. You know, a lot of the marketing is about, you know, looking at the data as a way to confirm the theory. But I think we should you know, when we start doing the opposite is when we really learned when we started looking at things that don’t match or things that that don’t work the way we wanted them, then we will learn something new that that becomes effective in understanding the intent of the user.
David AmerlandThat’s brilliant. That’s fantastic. And that’s a pretty good finishisng tip. That’s excellent. We’ve basically covered a lot of stuff which has to do with knowledge, graph, semantic web, semantic search practices, ontologies and a lot of the practical applications. Just checking to see if there’s any final questions from the audience which we can address here. And if there isn’t, I would just say, guys, thank you so much. Where would we find you if we need to see where you are? Let’s start with Dawn.
Dawn Andersonhttps://bertey.com/ and https://twitter.com/dawnieando
Andrea Volpinihttps://wordlift.io/ and https://twitter.com/cyberandy
Hamlet Batistahttps://twitter.com/hamletbatista & https://www.ranksense.com/
Jason Barnardhttps://kalicube.pro/ & https://twitter.com/jasonmbarnard
George Anadiotis: OK. Well thank you. Thank you all. Thanks David for moderating. That was really great. I learned a few things and just wrapping up on behalf of the organization team, I just wanted to point out that in addition to what you just shared, so Twitter handles and so people can can follow up and keep up with what you do. I also just heard the link to the next workshop, which is, again, extended family format, which is coming up, which is on A.I. and knowledge to the intersection of knowledge graphs and AI how can one help the other, which is also going to be super interesting.
Thanks, everyone. Thanks for all of you who made it such a great conversation, thanks to the people who attended and everyone behind the scenes who made sure things worked. And I think they mostly did.
So if you don’t have anything else to add, I think we have to wrap up about here because we also have to start the next session.
David Amerland: Brilliant. Thank you very much. Bye.