And now, if you're interested in going deeper down the rabbit hole...
What Do Regression Models & On-Page Optimization Have in Common?
The Lost Art of On-Page Optimization
by: Jeremy S Bittle, 10/18/18
On-Page Optimization is a lost art these days. In our practice, it is the most important, foundational factor of which to take control when conducting SEO Implementation.
Why is it lost?
Why does any art become lost? It's hard to say.
There are a multitude of factors at play here. I'd first have to posit that it requires too much measurement, knowledge, an understanding of HTML/CSS, is contingent on good research, understanding of LSI--All-in-all, too much energy to spend, especially when scaling an Agency, hiring offshore development work to handle implementation of which, we could go on and on, eventually we'd come to the end where, some SEO's are just downright lazy, they want shortcuts, fast results, because that approach is very scalable in a traditional Agency model.
The real shortcut:
At the end of the day, this lost art creates a 'shortcut' of it's own.
Most SEOs have sought a quicker route for so long, it's directly under their nose and they don't see it. All this attention and focus to Off-Page, link building, near or full-on black hat, provides room for us to easily compete in the Relevancy territory, often not needing backlinks (Authority) in the first place. It becomes a shortcut in and of itself.
Google's Algorithm, simplicity revealed:
We should backtrack for a second and talk about what really makes up Search Algorithms.
The first thing to consider is that Google's algorithm really isn't that advanced. This notion is often derived from their secrecy, mystique, and excellent PR team, the Webmaster Trends Analysts.
Through research and testing, we have determined Search to be fundamentally made up of three buckets.
The (3) buckets of search algorithms:
Lets go through this list backwards.
Bucket #3: UX:
This one is listed last not because it is least important, but because it is least understood.
I used to sing this tune in meetings all the time, make a better website, make better content, yada yada yada my hat is very white.
It's true, it's important, but as an SEO I think it's more important to be able to do my work in a vacuum.
While this is an excellent foundation, and the beginning of any good SEO campaign, some businesses just won't have resources to conduct an entire web redesign just to see some action in the search results.
Further, with an avid lack of transparency coming from Google's Trends Analysts, the only thing we really know about this bucket is, it's mostly powered by AI--RankBrain--and it's becoming more important. Realistically, will probably replace the majority of the Authority bucket's weight on algos.
Make a good site. Make good content. You should do that anyway.
Bucket #2, Authority:
Off-page fundamentals. As we know, it's all backlinks. It still works.
Penguin 4 made it easier to conduct (as there are no more automatic algos to take you down), but way harder to recoup losses if you get caught (manual reviewers will destroy your brand).
To me, it's a big risk vs reward issue and frankly when we see the competitive landscape saturated with this strategy, within the larger landscape of search algos dialing it's weight back anyways, I skip this step entirely until everything else is dialed in.
Usually by that time, I'm already on top.
Bucket #1: Relevance:
TF / IDF, term-frequency inverse-document-frequency. This is an old algorithm invented in the late 1970's that determines the relevancy of a term (keyword) to a document (result) within a set of documents (index).
Google is just a big fancy billion dollar CTRL-F:
Yep, that's right. And if you can pick up the lost art of On-Page, you can manipulate it that easy, too.
Factor Measurement + Regression Analysis = Correlation Data
(On-Page done right):
We use highly sophisticated software that condenses 3+ months of an incredibly large amount of research, data, and calculation down to 15 minutes by a computer.
We look at the best-of-both correlation coefficients from two different regression models to determine which SEO factors are most statistically significant to a specific keyword or query. The (both) models, Spearmans Correlation tends to fit well against linear trend lines, and Pearsons Correlation tends to fit well against curved trend lines. The data set is fetched by a spider from the top 100 websites in a given SERP for the target Keyword.
Correlation Data, not Causation Data:
Practical Maximums and Overall Maximums:
We have to remember that these calculations are correlation studies, we cannot determine which relationships are truly causal in nature. It is when we look at the data in context, and conduct some field testing till we can prove that any data supports a causal relationship.
Contextually speaking, this means we have to look at the entire report, the actual websites in the field, and run these calculations week over week.
Along with correlative factors, we have the measurements themselves. We tune each factor to competitive parity from 'Word Count' on page, to the number of 'Leading Keywords in H3 Tags.'
In this way, we never over-invest resources in a particular area (in the case where we wouldn't exceed the Overall Maximum of 'Word Count' in the SERP), while diversifying the way in which we 'keyword stuff' (for lack of a better word) across different zones in the HTML document (in the case of maintaining a Practical Maximum on our H3 tunings).
We've found the Practical Maximum to be the most relevant metric to tune tags. This metric is taken from whichever is highest, the tune of Result #1, #2, #3, or the Page 1 Average for that keyword. And of course, if we know a particular result to be an outlier website (Like Yelp, or another QDD situation), we can adjust our new Practical value by taking from those four measures differently.
What does this mean?
Alright nerd, give it to me in english.
This effectively becomes a map on how to keyword-stuff the page, and how to do it right.
The wrong way was, back in the day, stuff it all in the footer, white text on white background, stuff it all in P tags, repeat it over and over. The right way is: diversify & measure.
Now, we measure how many keywords exist in P tags, but we also measure how many exist in 500+ other tags and factor measurements. By diversifying these keywords across different zones of the HTML file, while never exceeding the Overall Maximum across the top 100 sites for the SERP, we become the car on the highway never speeding faster than our competition next to us.
What's more, the running theory is that there is no absolute speed limit: The 'limit' is actually relative to the average tuning across the market landscape. AI could easily determine this the same way our software reverse-engineers it, which would account for the 'weight' that various factor measurements have over others that directly coincide with the correlation coefficients calculated by these regression models.
The Big CTRL-F Bucket:
Yep, TF-IDF is older than me, maybe you too. It's the skeleton of the algorithm and Google is scared to admit it. Their PR team does a great job spinning it and scaring everyone into submission. They say over and over, don't stuff your keywords. But over and over, it works.
So long as you measure, you keep measuring, you do not exceed Overall Maximum on the SERP (realistically Practical Maximums as well), you're mathematically good.
You also want to, of course, make sure that the document is not spammy to Users.
So, that is to say, don't be an idiot about it. If it starts to feel spammy, get rid of it and move on to another factor. The point of taking all these measurements is to diversify across the HTML document, you don't need to achieve competitive parity on all 540 factors, but you might want to consider the ones that are statistically significant.
Remember Bucket #3, UX is still important:
Even if it weren't a ranking factor, it takes hold on your conversions. Ranking is great, but your conversion rate optimization is more important. This is where you really have to weigh the opportunity-cost on changes made, and it differs case by case.
This is why diversification is so integral to SEO.
Algorithm Changes? Who cares?
When you measure like we do, you don't fret the latest shakeup on Search Engine Journal. It's pretty nice, all we have to do is take another round of calculations. When the correlation coefficients change but the factor measurements don't, you're probably looking at a change at algorithm-level, then we implement site changes accordingly.
Efficiency is Effectivity:
My favorite part about these processes is they are incredibly efficient. Efficiently doesn't mean I spend less time on a project, it means I can spend more time being effective at it. This process condenses huge amounts of market research down into a few minutes, so it allows me to create a huge foundation on a hyper-competitive keyword or query right out of the gate.