Recently Dave Davies wrote about Google’s BERT vs SMITH and how they work & work together. With so much coming out in recent years around BERT, SMITH and just NLP in general we thought this was a great time to get Dave on to dig into these topics a bit more.

Guest Host Dave Davies

To say he was happy to “geek” out over this topic was an understatement!

Dave Davies covers a ton during the episode and rather than trying to surmise all of his points much of what you see in this write up are from some notes Dave Rohrer took prior and during the episode. To get what Dave Davies actually said, scroll further down to the Transcript as it will be your best bet for sure this time.

Update on SMITH/Passages

Barely 1 hour after we finished recording this podcast Danny Sullivan as Google SearchLiaison posted that Passages had gone live – tweet is here.

What is NLP and Why Does an SEO/Marketer care?

This was one of the questions that Dave Rohrer wanted to dig into during the conversation and we believe we did touch on it. The short answer is going to be to structure you content as you should have all along. The long answer is that you need to listen to the episode or read the full transcript because along the way Dave gives some ways and reasons to optimize and think about NLP, BERT and SMITH.

What is BERT?

Search algorithm patent expert Bill Slawski (@bill_slawski of @GoFishDigital) described BERT like this:

“Bert is a natural language processing pre-training approach that can be used on a large body of text. It handles tasks such as entity recognition, part of speech tagging, and question-answering among other natural language processes. Bert helps Google understand natural language text from the Web.
Google has open sourced this technology, and others have created variations of BERT.”

– Bill Slawski

What is SMITH?

SMITH is also known as Siamese Multi-depth Transformer-based Hierarchical (SMITH) Encoder. In a very simplified description, the SMITH model is trained to understand passages within the context of the entire document. SMITH and BERT are “similar” in a very simple way but where SMITH gets involved is in understanding long and complex documents, and long and complex queries. BERT as you will learn from Dave Davies is much better situated to focus on shorter pieces of content.

At its core, SMITH takes a document through the following process (paraphrased mostly from Dave Davie’s article):

  1. It breaks the document into grouping sizes it can handle, favoring sentences (i.e., if the document would allocate 4.5 sentences to a block based on length, it would truncate that to four).
  2. It then processes each sentence block individually.
  3. A transformer then learns the contextual representations of each block and turns them into a document representation.

BERT vs. SMITH

  • BERT taps out at 256 tokens per document. After that, the computing cost gets too high for it to be functional, and often just isn’t.
  • SMITH, on the other hand, can handle 2,248 tokens. The documents can be 8x larger.
  • SMITH is the bazooka. It will paint the understanding of how things are. It is more costly in resources because it’s doing a bigger job, but is far less costly than BERT at doing that job.
  • BERT will help SMITH do that, and assist in understanding short queries and content chunks.
  • That is, until both are replaced, at which time we’ll move another leap forward and I’m going to bet that the next algorithm will be:
    • Bidirected Object-agnostic Regresson-based transformer Gateways. (So Dave you are saying Google is going to create a BORG algo? Nice!!!)

Passage Ranking

The following passage about passages is from Google Passage Ranking Now Live in U.S.: 16 Key Facts:

Recently Bartosz then asks if tightening up the use of heading elements in order to better communicate what the different sections of a page are about will help or if Google will understand the content regardless of the markup.
“It’s pretty much that. With any kind of content some semantic and some structure in your content so that it’s easier for automated systems to understand the structure and the kind of like, the bits and pieces of your content.
But even if you would not do that we would still be able to say like… this part of the page is relevant to this query where this other piece of your page is not as relevant to this query.
Splitt next suggested that if you have a grasp on how to organize content that this (passages ranking) is pretty much not applicable.

Passages vs. Subtopics

He said Subtopics is a way of understanding things and Passages is a ranking thing. For more on Subtopics go read Google Confirmed Launching Subtopics Ranking In Mid-November 2020.

Additional and Mentioned Resources:

Full Transcript

Matt Siltala: [00:00:00] Welcome to another exciting episode of the business of digital podcast, featuring your host, Matt  and Dave roar.

Hey guys, excited for everybody to join us on another one of these business of digital podcast episodes. And today we, that today’s going to confuse me because we have a bunch of Dave’s in the house. And first of all, I’d like to welcome Dave Davies, founder, or co-founder of. Beanstock and, uh, we got to make that clear, not want to get anyone in trouble, but a welcome Dave and other day.

Hi, how are you guys doing?

Dave Davies: [00:00:39] I can speak for myself. We’re doing well. And as Dave’s from our generation, I can say. And Mr. Roar, correct me if I’m wrong. You’re, you’re used to chatting with a multitude of other Daves at the same time,

Dave Rohrer: [00:00:50] there are a number of us out in the industry. So yes, it does happen from time to time.

Matt Siltala: [00:00:54] There’s quite a few, um, mats, but only, you know, very few with just one [00:01:00] T. And that actually uses two T sometimes because of like reputation management and how people search week. Anyway, it’s confused, but welcome, Dave. I’m glad to have you on, and I’m going to kind of toss this over to the other, Dave, Dave roar.

And, uh, there was a article that you wrote and I’m going to let him get into that and why he wanted to chat with you today. So, Uh, let’s do it

Dave Rohrer: [00:01:25] and I will let Mr. Davis give a quick intro, but also just a bit of a background about you’ve written about not just this, but some other similar topics in the past.

So I don’t know if you want to just kind of give, uh, your intro and then kind of give an intro into Smith Burt, but really start at the very beginning of what the heck and NLP is. I don’t keep saying NPL.

Dave Davies: [00:01:48] Sure. Um, well, I guess Matt did a, did a good intro. Um, I’m Dave from, from Beanstalk internet marketing and a co-founder with, with my wife.

So, so that parts, [00:02:00] parts. Awesome. And need to get that in. So now, now I won’t get in trouble

Matt Siltala: [00:02:02] and I must say that I really miss you guys. I miss seeing you guys at the conferences. I just, I have to throw it out that, that out there, because you know, I was thinking about it. And it seems like it just the year that we’ve had, or last year, it just kind of, you know, blink and it’s gone with, but a bit, but it’s been so long, but I was thinking about this and the last time that I actually saw you guys was, um, two pumpkins ago.

It’s crazy to think about that. Or.

Dave Davies: [00:02:28] It really is. And it’s, I think we’re all sort of in that same boat where it’s like, but I want to see my friends again. I want to go out and play. It’s like, yes. Okay. There’s the business aspect back to these conferences. And I think, you know, a lot of us sort of missed those early and then all of a sudden it’s like, now I’m at the part.

I really miss, like the part I really want to get back to you. Um, Yeah, it was even just like, okay, let’s not do a conference. Let’s just get together for a period of time. Look, it’s probably

Matt Siltala: [00:02:53] going to happen once every once we can travel. Yeah. Anyway, sorry for the sidetrack. Go ahead.

[00:03:00] Dave Davies: [00:03:00] Um, and yeah, I mean, I I’ve been interested in.

I got my, my beginning interest in, um, in machine learning and machine learning systems, which is what we’re sort of talking about. When we talk about Burton Smith, um, goes back to my initial interest in patents. And I I’ve been reading them since I don’t know bill. Slawsky got me interested in them back in like 2006, 2007, uh, and reading their patents.

And then all of a sudden I started seeing entities come up more and more and more often in a variety of their patent applications. And started writing about, uh, about entities. And now we all sort of know them as just sort of a given. Um, aspect of, of SEO. Um, but early on, it was, it was a little more, uh, finesse because they hadn’t actually been launched machine learning system right.

At, at, at that time. Um, so, you know, RankBrain, hadn’t deployed and stuff like that, but it was, it was interesting to see. And then now what we’re seeing with, with Burton Smith, uh, will Smith hasn’t actually been rolled out into, into an algorithm, but we can talk about when I predict that’s going to be happening, which is very, very [00:04:00] soon and why it hasn’t yet, which.

That we can, we can get to when we get to there. But, uh, um, but yeah, I mean, I, I, this is all really, really interesting stuff. And, um, natural language processing is, is my favorite branch of machine learning. So, um, this was just, just sort of a natural for me.

Dave Rohrer: [00:04:18] And do you want to talk about what just at a high level? Um, we have, uh, our listeners are kind of across the board, but, um, for those that don’t really know how machine learning and NPL. Fit into this or what they even are just a, you know, quick, quick

Dave Davies: [00:04:35] intro. Sure. Um, NLP or natural language processing is a branch of, um, of machine learning.

Um, and there’s a bunch of different machine learning things that they can go on. But natural language processing is probably the one that. We would most think about when we think about search, um, and it is pretty much exactly what it sounds like it would be. Um, and, and basically it’s [00:05:00] it’s Google or, or bang, you know, to, to their credit.

They have some, some strong systems themselves. Um, Breaking down language and the way they do it, isn’t it. And as we think they would, um, you know, they use tokens and things like that. Anyway, we can get into the technical side or not, um, later as to how this works, but basically breaking down language into an understanding of how the pieces of it fit together and how it relates to other things.

And, and that’s where this all brewed it in. If we go. Back if we want to really simplify is if we think about an entity and an entity for, for listeners who might not be familiar with them yet, um, you can think of it like a noun, a person place or thing. I mean, it, it gets a little more broad. The color purple is specific.

Pantone color purple is also an entity unto itself, a pixel on a screen as an entity. But I, Dave, I’m an entity right. As well. Um, and I’m different than the other, Dave, right? As, as, as an entity, um, part of what we would think of as NLP or natural language processing [00:06:00] would be connecting these entities, but not really as entities, but as, as words, I’ll, I’ll, I’ll simplify, but it would actually be as tokens, but, but to simplify, connecting these words together and, and, and sort of understanding that when this one follows, you know, when, when this.

Numerical value of a word is followed by this one it’s followed by this one and all of these connect together. Um, then it tends to mean this thing, right? Mean Google, when we’re, when we’re thinking about Google’s understanding of language, they’re not understanding language like you were. I do, they, they don’t get a image in their head when, when the word cloud appears.

Right. But you or I, or, or all the listeners, as soon as I say the word cloud, it actually. Becomes an image in our heads. We were picturing a cloud. All of us will picture a different one, but we’re, we’re picturing a cloud. Um, Google isn’t able to do that clearly because they’re there it’s a computer system.

Um, and so they have sort of a numeric value or machine learning ID, um, tied to this term. And they’re just [00:07:00] connecting these all things together. Now, when we, when Burt ruled out the big advancement with Burt, uh, was that they were able to build their understanding of two directions. So where I would say I have a red car, I actually don’t have a red car.

So I’ll switch that. I have a gray car. Um, it w when I, when I say that term before birth, they could only follow in sequence. So when I said the word. Car. It could understand that the word red had existed, but, but when it understood red, it couldn’t understand car because it hadn’t hit it yet. Now that might not seem important, but when you’re classifying things, it becomes incredibly important.

When I want to classify, what are all things that are red or gray to, to use? T’s sort of my own personal example. If I want to classify things that are gray, if I don’t yet know the word car exists, then I’m not going to include car in that classification. Whereas if I now qualifying things, you know, things that are cars that I’m now trying to qualify them into colors, I will understand that it [00:08:00] was great only because of the way I worded that sentence.

I could have worded it differently and you would have understood the exact opposite Burke fixed that and it allowed for by directional that’s what the B stands for. Um, so it, now now fix that. Um, and the training methodology behind it is, is absolutely fascinating. Um, but Smith builds on this. One of the problems with Bert, um, is that it is basically it’s, it’s a, it’s a resource pig.

Um, and, and just the way that it functions, I mean, All of these, all of these systems are, but you just need to build them in certain ways. Anyway, Burt was incredibly task intensive, um, and it sort of capped out at 256 tokens and a token there’s 30,000 of them in, in the bird, um, sort of model, um, and the way it works as a breakdown, language, individual words will have their own token.

In some cases, each character has its own token, you know, parts of common parts of characters like sh will have its own token, for example, um, And it basically [00:09:00] builds, builds together. These, these sort of tokens, um, it capped out at 256 tokens, which doesn’t work well for long form content. If I want to understand a large passage and I’m going to use the word passage purposefully, um, in this case, because we’re all familiar with passage indexing or, or if listeners aren’t passage indexing is also rolling out and we can talk about that separately, but, uh, with Smith, it gets around that it does a lot of the processing offline.

Uh, I mean, when, when I say offline, I mean, not actually sort of like live in real time, it’s obviously connected to the other networks. Um, and so it’s able to do, and, and just in the hierarchy, it sort of uses Burt S breaks it up into individual sentences or sentence segment blocks, um, breaks it up.

Creates a numeric value for how all of that is functioning together, or I guess not a single numeric value, but it creates a structural value for how each sentence works and then it combines all those. It basically creates. Uh, runs Burt on each sentence segment, and then it looks at all this [00:10:00] conclusions that it hadn’t sort of runs it.

It’s not the exact same system, but it runs a similar system on top of that. They figure out what that larger passage, um, would be like. So it’s basically breaking up into smaller sort of I’m simplifying, but it is smaller Burke models here and then doing an analysis top level. So it’s going to be a little bit slower, which is why a lot of this stuff is done.

Sort of offline is, is it is slower, but it’s allowing them to get a much better understanding of larger segments of, of content. It’s able to read larger passages and, and to illustrate, um, where the difference would come in, um, in S is, is really quite visible in their training model, where with Bert, as part of their training model, what they would do is, is sort of feed it a sentence and omit a word.

And then the better that. Burt became at predicting what that word would be. They knew that the, the model was more successful. So they’d sort of fire up a sentence, omit a word, give it light. I don’t know how many and in the examples it’s [00:11:00] two, but I’m sure it’s more give it a bunch of examples of one of these is right.

Figure out which one it is. And the more reliably it would guess the better the model was trained. And then once it hits the success rate, um, that surpasses their, their current, uh, engineered built systems, they would deploy it with Smith. They’re omitting entire sentences. So they’re going, here’s the long passage of content we’re omitting the sentence.

Here’s a bunch of sentences that it could be, um, fill that in and then training it on larger, larger passages. Um, of information. Um, so it’s allowing Google to understand large chunks of content, which to me has impact on not just their ability to surface content that they might not have been able to do before.

But it also understanding the internal linking structures and how, um, links between, between documents might, uh, might relate stronger to each other than they might’ve known. Just in the context of, you know, the couple words on either side of it.

Dave Rohrer: [00:11:53] Well, and if you look at, if you think about the SERPs and how that impacts things, I just thought of some, um, examples and Roger [00:12:00] Monte had an interesting example.

He likes to go cow fishing, which is an example he used in one of his articles that we’ll reference. And it’s what they call up in the Northeast where he lives. So a certain fish it’s called a cow and it’s called cow fishing. Now, pre Bert, something like that would confuse the heck out of Google. Same thing with, you know, if I was looking for a red Mustang, Or a Brown Mustang.

It doesn’t understand that you’re looking for the car or the horse. And even then without additional context, there’s different problems. But other keywords in searches that around Apple, I need a new Apple. Like there’s a lot of keywords without context, us humans can easily understand and easily decipher and say, Oh yeah, they’re talking about, you know, it’s the, it mentioned keyboard further down in the paragraph or the page title or something.

It’s like, Oh, they’re talking about Apple or the iPad or something. But for Google and you know, these algos, they don’t, they don’t get that

Dave Davies: [00:12:57] well. And that’s, that’s very true. And you bring up a really interesting [00:13:00] point and people are curious. Um, you can actually see how they fix part of that problem. And it definitely is Burton enhancing it.

But if you want to ever know what your entity is and not everybody gets one. Um, but I, the, the bane of my existence, I mean, in, in sort of a fun way, cause he also serves as an example, is this guy from the kinks and his name’s Dave Davies, right. Is the reason I will never rank for my name. Um, if you ever feel so inclined and you go to Google and you enter, I’ll refer to date Davies, but I could do Apple, um, the Apple as, as the fruit is different than the Apple as a company.

Right. Clearly that was, that was the example here. Just like the day babies of the kinks is different than me as Dave Davies or any of the other, like six that are in my city. Um, but I can tell you that what Google sees when it sees day babies, like from the kinks, um, is G slash one Q five, J for C Y S T.

That’s [00:14:00] their machine ID for that Dave Davies. That’s how they differentiate that one for me. Um, and they will do the same with Apple. The fruit will have its own machine ID, entity machine ID. Um, versus versus the company, if you’re ever curious to find out what it is, um, if you just go to Google and you Google anything, if it has a knowledge panel, you can just view the page source.

And if it’s from after, if it was a, an entity that was created after give or take about 2015, it’ll start with a slash G slash. Um, and then before that it was slash M slash and that was Emma’s from freebase. And then, uh, G S is, was generated by Google after, after a freebase was phased out, but. Um, it’s funny.

Dave Rohrer: [00:14:39] It’s funny. You mentioned freebase. Cause I just had that opened up a little bit ago because I was like, I was going to bring it out, but you already beat me to it. Cause that was kind of, for me, that was my initial, you know, like Wikipedia. I always kind of looked at him like by categorization, I looked at Wikipedia, but then when I started learning and digging into freebase some years ago, I was like, This is [00:15:00] probably going to be the future.

This is Google, you know, buying this because they wanted the entities. They wanted to understand what all of these things were. And some it’s like Wikipedia, Wikipedia, but it was, um, DEMA. Sorry, you know, it was like de MAs, but deeper.

Dave Davies: [00:15:15] Yeah. Yeah, it was. And without the fluff, right? Like, I mean, yeah. Um, Yeah.

Okay. Cause that’s, I won’t pretend there wasn’t fluff there, but it’s irrelevant now. Um, yeah, exactly. And you can just sort of buy it. Um, but yeah, I mean, it, it was an incredibly important move that if people were paying attention and clearly you work, um, you know, which is why you’re sort of, one of the people that’s on the.

Sort of forefront understanding of, of sort of what’s going on in the world of search. Depends on the 

Dave Rohrer: [00:15:46] depends on the day.

Dave Davies: [00:15:48] I mean, now, you know, you you’ve been doing this long enough to remember it and said why back when we could all just know sort of everything that was going on in search. Now

Dave Rohrer: [00:15:55] we could just scrape whatever was number one and throw it up there and make it fresher.

[00:16:00] We could be number one. Yeah. Yeah, entities and national language processing and internal link structures. And like, you know,

Matt Siltala: [00:16:09] I remember the days of, I remember the days of just putting a keyword in the title and raging come on. Yeah.

Dave Davies: [00:16:17] Yeah. Well, I mean, there used to be all sorts of fun stuff we could do.

Right. Like I remember I was an affiliate marketer. It’s like, well, if I just create a page and then I turn that page into a PDF, I get positioned one end, too. Okay. That bad call Google as a SEO. I miss those days as somebody who actually needs to find stuff on Google, thank goodness jerks like me and my affiliate marketing days in the early two thousands.

Uh, can’t get away with what they were doing. Um, or at least not as, not as easily. They need to be smarter to do it now. Um, but, uh, but yeah, I mean, this is all a super fascinating area, right? I mean, you can chat about, uh, about machine learning and sort of the, the future, uh, of machine learning and search.

Um, one of the things that I found fascinating about [00:17:00] Smith. Um, specifically once I was, I was reading the documents around it, um, was not just what it accomplishes, but what it means for us, because at the end of the day, we can talk about algorithms. We can talk about brain brainer, burgers, Smith, um, what they do, I personally and academically find fascinating.

But at the end of the day, it doesn’t really matter. Um, you know what they’re doing structurally, what matters is what is it going to do? Right. Like what, what does this mean to your listening audience? What does this mean to my clients? Right? Like, that’s actually what, what matters here? Um, and one of the things that I found specifically interesting about Smith and if I’m guessing, and of course I’m guessing, right?

Like, I mean, we don’t know what’s actually going on over at Google, but if I’m guessing they had some problems configuring Smith to actually operate in their, in their, um, you know, overarching algorithm. Uh, and the reason I’m guessing at that is it lends itself [00:18:00] by. What it does. It lends itself very well to passage, indexing passage index, and your thing that was supposed to roll out in December.

And didn’t. Right. And we’re still sort of waiting on it. Um, so I suspect what we’re seeing is Google hit some problems with Smith. Uh, they’re they’re working out the bugs, the, the, the, the sort of ran through the tests all properly and everything. It’s just, I, I have a feeling that it’s, it’s taking more resources than they anticipated.

I’m just guessing at what was going on, but understanding how the mechanics of it work as far as you know, me as, as a lay person in machine learning. Um, you know, sort of, sort of can, can predict. Um, but I, I suspect that once they got that fine tuned and they actually have Smith working, right. They’ll have passage indexing.

They probably started it with a smaller. Um, sort of model, um, went, Oh, okay. We can do this by December. And then they opened it up to the web and went, wow, we need a lot more hard desks. Right. I’m oversimplifying clearly. But, um, but you know, I think back that’s one of the things that’s coming in and that ties nicely [00:19:00] to, um, some of the comments from John just a bit ago, talking about the different types of speeds of disc that they use was something like Smith.

A lot of things I suspect would be stored on sort of old-school. Um, you know, quote unquote slower, um, hard disks, you know, the, the standard, whereas things that are more important and need to surface would be on, you know, faster drives. Um, I suspect that that’s probably, I’m just guessing, but probably where we’re staying, some of their kinks are, is the algorithm itself works.

Right. Try and actually apply that to the entire web. Yeah. And as soon as you hit bottlenecks, all of a sudden, you, you, it doesn’t work anymore. Right? You have to be able to apply something equally. It went something like this. It’s not like rank brain where you can apply it to just some queries you’re going to end up punishing or, or, or damaging your web results in a lot of core areas.

If you can’t cover the entire web with it, and the amount of data processing they would need in the amount of data storage they would need, um, for something like this is, is astronomical. So. I suspect we’re not going to see passage indexing until we [00:20:00] see Smith, uh, rolling out. And I think they’ve got a big infrastructure change now, if I’m guessing.

Right. And I probably not, but there we go. I’m going to guess that that’s what the core update, um, that we saw in December was probably some sort of preconditioning, um, to, to getting things ready. But that’s, that’s just my shot in the dark guests. Um, well we know it wasn’t Smith. Um, because they’ve confirmed that.

So my guess though, is it was sort of like a, like a hummingbird ask a kind of algorithm sort of conditioning for what was going to be coming next. Started.

Dave Rohrer: [00:20:30] Yeah. Yeah. The, um, do you, either of you watch Snowpiercer

Dave Davies: [00:20:35] no shit.

Dave Rohrer: [00:20:37] It’s an interesting it’s based on all those list. Yeah, it was a movie before and now it’s a TMT show.

Yeah, my wife’s been watching it, but it makes me think of it’s very simplification of what Google is trying to do. Um, or, you know, even just you’re running, you know, a race or snow piercers case. It has to keep going around the circle. Cause [00:21:00] everything else is frozen and dead and it just keeps going around in the circle.

This train, meanwhile, while the train is running full steam, everyone’s living on it and you have to keep making repairs. Now the complexities for that is, you know, if it fails, if you break something while you’re running around like that, everyone’s going to die. Now, Google is not life or death, but their business would.

So Google is here trying to implement Bert Smith and all of these other algorithm changes and all these other, you know, levers and pulling it now trying to. You know, in a small scale say, Oh, well, we’re just going to fix this one little piece, how it plays into everything else and all those other lovers.

I don’t think they get a good, I don’t think they always have a good estimation or an understanding of how, when it gets thrown into the mix, what it’s going to do.

Dave Davies: [00:21:49] Yeah, well, that’s, that’s very true. I mean, and, and these things are hard. Right? And I, I try to remember that, like every time

Dave Rohrer: [00:21:54] Google, small subset of here’s, here’s a thousand queries or 10,000 [00:22:00] queries that we know are common in some weird ones.

Now here’s the data and here’s all of the processing we do. Now, if we multiply this by, uh, you know, a million, what does this do to our infrastructure? What does this do to everything? I think if you’re off just a smidge. In your estimations, it’s ridiculous at different, you know, numbers you’re looking at.

Oh,

Dave Davies: [00:22:21] well I think you hit the nail on the head and a lot of these systems don’t scale. Well, um, or they would scale unpredictably, right? Where to exactly, to your point, it’s like, okay, you’ve, you’ve, you’ve got a thousand, but now once you’re trying to connect those thousand altogether as well, like, okay, now you’ve got 10 million, but that’s not just that thousand multiplied two to 10 million.

Right. It’s. Yep. Okay. But now all those things now need to connect. So it’s, it’s exponential, um, in how these things grow and I’m sure, like, clearly I’m not the first person to think of that and I’m sure they have engineers that do, but there’s going to be a lot of variables in there that Oh, okay. [00:23:00] We missed this glut of pages.

Um, right. Like we missed, okay. We, we missed all these kind that are hard index they’re, they’re harder to get through. I got a lot of their working models are Wikipedia. That makes great sense from like a semantic connectivity kind of way, but buckle up and try. And actually now apply that to some JavaScript sites.

Yeah. You can get through it, but it’s harder and they’re slower and you need to render now. Right? Right. All of these sorts of things are going to add to the complexity and

Dave Rohrer: [00:23:29] how many languages. So all the different languages too. So yeah, we got it to work in English, but then we moved it to Spanish or German or Chinese or, you know, something else.

And we’re like, Oh, well that didn’t quite work that way. Like, there’s like eight different ways to say, um, certain things in some languages. Like, I’m not going to say the F-word, but you know, there’s people that you could actually say the F word 12 times in a row and each one has a different meaning behind it.

I mean, In an English example.

Dave Davies: [00:23:58] Well, and that’s, that’s actually a [00:24:00] great example because being Canadian myself, I learned French as a second language. Don’t curse up there. Wait, wait, wait to,

Dave Rohrer: [00:24:07] when you’re in America,

Dave Davies: [00:24:10] when you’re in grade six and you’re learning French, you sure do, because that word is actually the proper word for a baby seal.

So all of a sudden, now you’re just speaking French, right? Like, okay. So now you’re, now you’re gonna have fun with that. Right? Like as a, as a grade six, Dave, who’s

Dave Rohrer: [00:24:27] just discovered that who still does it from time to time? Just for giggles.

Dave Davies: [00:24:31] Yeah, exactly. And I’m just referring to the mammal, right? Like that’s, that’s it?

Um, I, I’m not, I’m not screaming at that kid who just, just. No shop beyond call of duty. Um, it’s not. Um, anyway, so, um, yeah, I mean, one of the things for folks who want to look it up, one of the very interesting things you, you could look up and I happened to be really lucky and got a client in the space that I, that I didn’t know.

Um, but if you want to understand a little bit about [00:25:00] Google, I’m looking into microservices. It’s an incredibly interesting, you, you don’t have to look into like specifically how, how they work and stuff like that, but just a, an I D gathering an idea for how they structure them, why they’re used in software development, um, is absolutely a fascinating area.

And, and thinking of how Google structures in them is actually built into the name of one of the, the. Basic technologies behind microservices. The idea of microservices is built on early. I mean, if we, if we go to early software development, it was, it was marvelous, right? Like we, they, they wrote windows XP and it was this massive program and it sat on your machine and if one piece broke the entire thing, whereas microservices, Netflix is one of the companies built on it.

If their logging system failed, everything else still succeeds. If like they’re, they’re guessing like each piece is compartmentalize so that if one piece fails there’s redundancy built in and another one will take over like a dumbed down version or, or whatnot, Google’s [00:26:00] algorithm is built on the same basic premise.

Um, as if each piece can fail unto itself. Okay. I mean, I’m sure that there’ll be like one sort of like masterpiece that’s needed to sort of hold all the pieces together, but one piece can fail and it doesn’t collapse the entire system. Um, and one of the ways that we sort of can have this reinforced is one of the core technologies that microservices is named Kubernetes.

Um, it was built by folks from Google. That in and of itself is telling us. And what it does is it’s one of the technologies of like, okay, and this is how the pieces connect together, right? Like you can build this, you can build this, you can build this, and then you just program them to work together through load balancers and stuff like that.

Um, fun fact just cause I I’m going to hope it’s it’s okay to go a little bit off topic here, but the nerds in the crowd will understand it. Um, Kubernetes was originally named. Um, project seven after seven of nine, and then went to Kubernetes and guess where the cube came from. Right? [00:27:00] So these are my favorite developers over at Google who named an entire thing after star point in the church.

But, uh, but, but that’s where that sort of came from. But for anybody who wants to understand how algorithms work, it actually really, uh, like a light bulb went off. And, and I was lucky because I had a client in the space. So I was sort of forced, um, in, into, into quote unquote, forced. I got to, um, learn about it, but the more I was sort of digging into Kubernetes and the history of it.

Um, and, and try and gain that, that understanding to assist a client. The more I was sort of piecing together, okay, this is how these algorithms function. This is why this can fail with this. And this is how they sort of have way back. Like you, you you’ve been an SEO long enough to remember the old Google dances, like back in like 2006.

I don’t mean that like party, um, you know, at, you know, at the conferences, the

Dave Rohrer: [00:27:48] Google dances, well, depending on which way your traffic went and you either were having a party. So yeah, I mean, It was a monthly party or a monthly cry. So yeah,

Dave Davies: [00:27:57] exactly. And then you sort of like, okay, [00:28:00] buckle up because this is what you’ve got for us

Dave Rohrer: [00:28:01] is going to be a, not a fun one.

Dave Davies: [00:28:03] Exactly. Whereas now they have this little piece where they can like, okay, hit a way back button, really fast and stuff like that. Right. So, um, it’s, it’s a much more. Resistant or resilient, I guess, is the word to use, um, sort of algorithm than they used to have. Like, we’ve seen that so many times where it’s like, Oh, okay.

And then the pendulum goes really, really fast where it’s like, okay, they do a core update. It kind of breaks a few things. And now we don’t have to tell clients, well, I’m going to fix a bunch of stuff, but four to six weeks. And you know, eventually we’ll see if that worked. Um, no, sort of, Oh, okay. This sort of either hit the fan or got really good.

And then I’ll have to tell a client, like don’t celebrate too much. We’ll see in a week where things settled too, because they can, they can respond a lot faster. And part of that has to do with microservices and just the way these, these algorithms compartmentalize so that they can fix one thing. They don’t have to overhaul an entire system.

Um, like they used to do.

Dave Rohrer: [00:28:58] So trying to bring this [00:29:00] back. Cause I know I don’t, I want to be conscious of your own time when we’re talking about Smith and Bert, is there anything, and I know Google says, and even you and I we’ll all say, no, you can’t really do anything, but there probably is some things that at least.

From a basic level on the onsite offsite. What are some things that if someone is thinking about, you know, worrying about, and we didn’t, we talked about it briefly, you mentioned it like passages is coming up. Is there something that someone can do to at least start trying to get their content, you know, short content, long time content optimize, at least start thinking about if they’re adding new content or new pages.

Is there anything that they should be thinking about or looking to do better or differently? The next couple of months or this year, or just going forward in general than they have in the past.

Dave Davies: [00:29:48] I think so when I’m thinking of things like Burt or like Smith, uh, more or any algorithm when I’m thinking about how do I respond to it?

I, I try and not so much. I mean, I read the question [00:30:00] and

Dave Rohrer: [00:30:00] we try not to talk about you shouldn’t respond. You should just, you know, plow forward and do what you do, but is there a change to like best practices ish?

Dave Davies: [00:30:08] I think there is, I think it’s actually quite dramatic. Um, with, with Smith, there wasn’t so much with Bert.

Um, mainly because it was just a better understanding sort of scenario. It gave Google the ability to better understand.

Dave Rohrer: [00:30:22] Yeah. I think Bert, there was a lot of sites that Google didn’t quite know what to show for certain searches. And you just, over the years got a lot of bad traffic that really wasn’t supposed to come to you.

Right. And then that just went away is what I saw

Dave Davies: [00:30:36] in me too. And that was great. And you’d get some insights

Dave Rohrer: [00:30:39] got traffic’s down. I go. But none of it converted.

Dave Davies: [00:30:42] Your cost per acquisition is still better. Yeah. You were, you

Dave Rohrer: [00:30:45] were ranking for keywords that really you shouldn’t have and you know, none of it ever converted anyway.

Yeah, well, that’s where traffic’s down. I’m like, but you’re not listening to me. Right. But yeah,

Dave Davies: [00:30:57] a publisher that matters, who’s making money [00:31:00] off ad revenue, but if you’re just yeah.

Dave Rohrer: [00:31:02] B2B sales company. Yeah. It doesn’t. Yeah. How were your

Dave Davies: [00:31:05] sales do it? Like that’s all that actually really matters. It’s the cash register ringing, um, with Smith.

I think there’s a lot more that we can glean from it, but it’s not Smith itself. Like Smith is like Bert, just the mechanics behind. Um, behind the system, but with Smith, they unlock new capabilities and it’s those capabilities that we can now optimize for. And to me, two things happened very, very close together, but one of them hasn’t happened, but it will be coming and that’s Smith and passage indexing.

Um, no, I, and again, just full disclosure. I’m assuming these two things come together. They, I, I may be wrong. So just for the listening audience, if I’m wrong, it’ll be passage indexing I’m referring to here. Um, but then they also were announcing and they discussed it, um, in January, but it had actually rolled out in November the subtopics.

So these two things merged together. The idea that subtopics sort [00:32:00] of cluster, when they were talking about subtopics, when, when Google launched the subtopics algorithm, Uh, or, or understanding in, into their algorithm. And they were talking about understanding that when this word is used, it, that is a subtopic of a larger thing they were talking about or related to queries.

But we saw the changes in rankings rollout when the, the, it was introduced into the algorithm in some cases, quite dramatically for some clients. So what that tells me from, from the analysis and, and not just mine, but other people’s is it’s helping them understand how pages connect together. Now we take a thing like passages, for example, and what Smith will be introducing into the algorithm.

And now we’re looking at, if I had one long page read Wikipedia, or a lot of, a lot of websites that, that sort of write longer form content to answer bigger generalized queries. It now gives them the capability to rip out of that smaller subsets of content and go, Oh, okay. Halfway down the page. We see this age to come in and.

H two matching of a heading tag to me is important. This one, because these [00:33:00] are the action items. Uh, you’ll see this age two going, okay, what is, I dunno, I wrote an article on, on RankBrain and it, it, it sort of ties in to this, cause it was actually, I was writing it as a test for this and we’ll see when it rolls out, um, you know, okay.

How to optimize for rank brain. Right. Okay. We saw that coming in here. Here’s the section on that and allows it to rip out of that. So when somebody looks up, how to optimize for rank brain, It will be able to scroll them down like it does with featured snippets to that, which is how I’m anticipating, um, passages will roll out.

It’ll now just sort of like take you to that section of a page, but it’s understand that piece of the page in isolation and then surface that content directly to the user. So what we have is two sort of functions going on simultaneously or very close together. Um, That almost sort of opposite, but the exact same purpose and one is being able to surface specific segments of content in a long form.

If somebody searched generally ranked brain, I want them to see all the information in the queries that they might have. But if they’re just looking for how to optimize for, I just want to surface this one individual [00:34:00] thing, is that a better format than if I had structured as I would for different types of pieces or in different types of content, one masterpiece on the generalized.

Bad example, but rank brain and then individual sub pieces on how to optimize for it. And, you know, what’s the history of algorithms related to machine learning, you know, sort of breaking it out that way, allowing us as users, if we are like, I mean, users as sort of SEOs and business owners. To now look and go, okay, what is the best form of content?

That’s number one is, is the user better served with smaller pieces of content in, in a cluster of terms. And then how do we cluster those structurally to make sure Google understands that this is a hierarchy and John Mueller just last week was talking about the pyramid structure, right? How do we, how do we cluster those together for logical understanding?

Poor is the user better served by long form content and then sub queries. Breaking out, just those, those snippets of content and us making sure [00:35:00] again, structurally, how do I really focus Google in on this is how do I structure it? That there’s an H two here and it actually defines what the query is. And then I have this, this section of content and in my mind, starting it out with something about 160 characters long.

Yeah, the length of a description tag, there’s going to be really easy for them to pull out and drop into a description tag. Cause you don’t get to write the description tag for each subset and they’re going to need one for the

Dave Rohrer: [00:35:27] surgery. Guys are really going to hate us, by the way, they’re really going to ask you, but they really are going to hate us.

Dave Davies: [00:35:32] And unless they charged by the hour, in which case, the love us so much. So I think there are some takeaways in the cold or in the cliff notes of things. I think there are some action items we can take. But it’s really just what we should’ve been doing anyway, but it’s a good, cool catching up with POS and it’s putting the content in the form that the user needs, but now, especially with long form content, making it clear.

[00:36:00] This is what this section is not using. Don’t use your head and this is just a rant, regardless for anybody listening. Who’s, don’t use your heading tags as formatting elements and make sure that you are using it. Don’t use a span element to define your sections here, because we really need to isolate to Google.

This is what it is to allow them to surface the content that they’re going to be looking for. Exactly how that’s going to surface visibly. If I’m predicting, I’m going to say down the road, it’s going to be a light box. That’s that’s my guess is that they’re going to surface like a light box and hopefully they’ll give us some controls or what it looks at to pull up the content from, from, from the page they’re wanting to surface, but, uh, we’ll, we’ll, we’ll see if I’m right or wrong on that one as

Dave Rohrer: [00:36:38] things, uh, the, the, uh, that the feature that they came out with and I had to look up when it came out.

So it was, um, The text fragment links that Google does now, because I keep seeing those more and more for some clients. I just, yeah. All the time. And there was a couple of long pieces of content. One of my client has like, I swear like a fourth or a fifth of [00:37:00] the Google search console, like links to the stuff that I see in analytics is always those fragments.

And I think they could use that technology along with passages. Yeah. Very interlinked into it, which would be interesting and straight link you right to that passage. All

Dave Davies: [00:37:15] right. Isn’t it fun. I love that we’re recording this too. Cause now we can all reference back and go. What did I think it was? There’s a

Dave Rohrer: [00:37:21] reason Matt and I only do one.

Uh, at the end of each year, we always do one, uh, Uh, prediction and we’re usually wrong ish, right? Yeah.

Matt Siltala: [00:37:31] Yeah. We’ll go with Randy’s right. If that’s okay.

Dave Davies: [00:37:34] Yeah.

Dave Rohrer: [00:37:34] There was some times where, right? Ish.

Matt Siltala: [00:37:38] Well, Dave Davies, thank you very much for joining us. This has been super informative. So. I do apologize for keeping you a little bit longer than probably you thought we were going to, but, uh, this has been fantastic information.

So thank you.

Dave Rohrer: [00:37:53] It’s like I can go for another hour and let’s go

Dave Davies: [00:37:56] so long since I like chatting with good friends about fun

Dave Rohrer: [00:37:59] topics, [00:38:00] maybe Mary’s and back on it’s lunchtime. It’s okay.

Matt Siltala: [00:38:05] Hi everybody. Thanks for joining us again on one of these episodes for Dave Davis with, uh, Uh, Bienstock internet marketing and Dave war with, uh, North

Dave Rohrer: [00:38:15] side metrics.

It’s like, I don’t know what day it is. Yeah.

Matt Siltala: [00:38:20] What day is it with North side metrics? I met solar. That’ll watch media. Thanks guys. We’ll talk to you later.

Dave Rohrer: [00:38:25] Thanks all.

Dave Davies: [00:38:25] Thanks Dave. Thank you.

E184 – NLP, BERT vs. SMITH and What it Means for Marketers W/ Dave Davies Hosts:




The Business of Digital © 2021