How Google’s AI Works (December 2022)
Search Engine Optimization (SEO) is the process of optimizing on-page and off-page factors that impact how high a web page ranks for a specific search term. This is a multi-faceted process that includes optimizing page loading speed, generating a link building strategy, as well as learning how to reverse engineer Google’s AI by using computational thinking.
Computational thinking is an advanced type of analysis and problem-solving technique that computer programmers use when writing code and algorithms. Computational thinkers will seek the ground truth by breaking down a problem and analyzing it using first principles thinking.
Since Google does not release their secret sauce to anyone, we will rely on computational thinking. We will walk through some pivotal moments in Google’s history that shaped the algorithms that are used, and we will learn why this matters.
How to Create a Mind
We will begin with a book that was published in 2012, called “How to Create a Mind: The Secret of Human Thought Revealed” by renowned futurist, and inventor Ray Kurzweil. This book dissected the human brain, and broke down the ways it works. We learn from the ground up how the brain trains itself using pattern recognition to become a prediction machine, always working at predicting the future, even predicting the next word.
How do humans recognize patterns in every day life? How are these connections formed in the brain? The book begins with understanding hierarchical thinking, this is understanding a structure that is composed of diverse elements that are arranged in a pattern, this arrangement then represents a symbol such as a letter or character, and then this is further arranged into a more advanced pattern such as a word, and eventually a sentence. Eventually these patterns form ideas, and these ideas are transformed into the products that humans are responsible for building.
By emulating the human brain, revealed is a pathway to creating an advanced AI beyond the current capabilities of the neural networks that were around at the time of publishing.
The book was a blueprint for creating an AI that can scale by vacuuming the world’s data, and use its multi-layered pattern recognition processing to parse text, images, audio, and video. A system optimized for upscaling due to the benefits of the cloud and its parallel processing capabilities. In other words there would be no maximum on data input or output.
This book was so pivotal that soon after its publishing the author Ray Kurzweil was hired by Google to become the Director of Engineering focused on machine learning and language processing. A role that perfectly aligned with the book he had written.
It would be impossible to deny how influential this book was to the future of Google, and how they rank websites. This AI book should be mandatory reading for anyone who wishes to become an SEO expert.
Launched in 2010, DeepMind was a hot new startup using a revolutionary new type of AI algorithm that was taking the world by storm, it was called reinforcement learning. DeepMind described it best as:
“We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards.”
By fusing deep learning with reinforcement learning it became a deep reinforcement learning system. By 2013, DeepMind was using these algorithms to rack up victories against human players on Atari 2600 games – And this was achieved by mimicking the human brain and how it learns from training and repetition.
Similar to how a human learns by repetition, whether it is kicking a ball, or playing Tetris, the AI would also learn. The AI’s neural network tracked performance and would incrementally self-improve resulting in stronger move selection in the next iteration.
DeepMind was so dominant in its technological lead that Google had to buy access to the technology. DeepMind was acquired for more than $500 million in 2014.
After the acquisition the AI industry witnessed successive breakthroughs, a type not seen since May 11, 1997, when chess grandmaster Garry Kasparov lost the first game of a six-game match against Deep Blue, a chess-playing computer developed by scientists at IBM.
In 2015, DeepMind refined the algorithm to test it on Atari’s suite of 49 games, and the machine beat human performance on 23 of them.
That was just the beginning, later in 2015 DeepMind began focusing on AlphaGo, a program with the stated aim of defeating a professional Go World Champion. The ancient game of Go, which was first seen in China some 4000 years ago, is considered to be the most challenging game in human history, with its potential 10360 possible moves.
DeepMind used supervised learning to train the AlphaGo system by learning from human players. Soon after, DeepMind made headlines after AlphaGo beat Lee Sedol, the world champion, in a five-game match in March 2016.
Not be outdone, in October, 2017 DeepMind released AlphaGo Zero, a new model with the key differentiator that it required zero human training. Since it did not require human training, it also required no labeling of data, the system essentially used unsupervised learning. AlphaGo Zero rapidly surpassed its predecessor, as described by DeepMind.
“Previous versions of AlphaGo initially trained on thousands of human amateur and professional games to learn how to play Go. AlphaGo Zero skips this step and learns to play simply by playing games against itself, starting from completely random play. In doing so, it quickly surpassed human level of play and defeated the previously published champion-defeating version of AlphaGo by 100 games to 0.”
In the meantime, the SEO world was hyper focused on PageRank, the backbone of Google. It begins in 1995, when Larry Page and Sergey Brin were Ph.D. students at Stanford University. The duo began collaborating on a novel research project nicknamed “BackRub”. The goal was ranking web pages into a measure of importance by converting their backlink data. A backlink is quite simply any link from one page to another, similar to this link.
The algorithm was later renamed to PageRank, named after both the term “web page” and co-founder Larry Page. Larry Page and Sergey Brin had the ambitious goal of building a search engine that could power the entire web purely by backlinks.
And it worked.
PageRank Dominates Headlines
SEO professionals immediately understood the basics of how google calculates a quality ranking for a web page by using PageRank. Some Savvy black hat SEO entrepreneurs took it a step further, understanding that to scale content, that it might make sense to buy links instead of waiting to acquire them organically.
A new economy emerged around backlinks. Eager website owners who needed to impact search engine rankings would buy links, and in return desperate to monetize websites would sell them links.
The websites who purchased links often overnight invaded Google outranking established brands.
Ranking using this method worked really well for a long time – Until it stopped working, probably around the same time machine learning kicked in and solved the underlying problem. With the introduction of deep reinforcement learning, PageRank would become a ranking variable, not the dominant factor.
By now the SEO community is divided on link buying as a strategy. I personally believe that link buying offers sub-optimal results, and that the best methods to acquire backlinks is based on variables that are industry specific. One legitimate service that I can recommend is called HARO (Help a Reporter Out). The opportunity at HARO is to acquire backlinks by fulfilling media requests.
Established brands never had to worry about sourcing links, since they had the benefits of time working in their favor. The older a website, the more time it has had to collect high quality backlinks. In other words, a search engine ranking was heavily dependent on the age of a website, if you calculate using the metric time = backlinks.
For example, CNN would naturally receive backlinks for a news article due to its brand, its trust, and because it was listed high to begin with – So naturally it gained more backlinks from people researching an article and linking to the first search result they found.
Meaning that higher ranked webpages organically received more backlinks. Unfortunately, this meant new websites were often forced to abuse the backlink algorithm by turning to a backlink marketplace.
In the early 2000s, buying backlinks worked remarkably well and it was a simple process. Link buyers purchased links from high authority websites, often sitewide footer links, or perhaps on a per article basis (often disguised as a guest post), and the sellers desperate to monetize their websites were happy to oblige – Unfortunately, often at the sacrifice of quality.
Eventually the Google talent pool of machine learning engineers understood that coding search engine results by hand was futile, and a lot of PageRank was handwritten coding. Instead they understood that the AI would eventually become responsible with fully calculating the rankings with no to little human interference.
To stay competitive Google uses every tool in their arsenal and this includes deep reinforcement learning – The most advanced type of machine learning algorithm in the world.
This system layered on top of Google’s acquisition of MetaWeb was a gamechanger. The reason the 2010 MetaWeb acquisition was so important is that it reduced the weight that Google placed on keywords. Context was all of a sudden important, this was achieved by using a categorization methodology called ‘entities’. As Fast Company described:
Once Metaweb figures out to which entity you’re referring, it can provide a set of results. It can even combine entities for more complex searches– “actresses over 40” might be one entity, “actresses living in New York City” might be another, and “actresses with a movie currently playing” might be another. “.
This technology was rolled into a major algorithm update called RankBrain that was launched in the spring of 2015. RankBrain focused on understanding context versus being purely keyword based, and RankBrain would also consider environmental contexts (e.g., searcher location) and extrapolate meaning where there had been none before. This was an important update especially for mobile users.
Now that we understand how Google uses these technologies, let’s use computational theory to speculate on how it’s done.
What is Deep Learning?
Deep learning is the most commonly used type of machine learning – It would be impossible for Google not to use this algorithm.
Deep learning is influenced significantly by how the human brain operates and it attempts to mirror the brain’s behavior in how it uses pattern recognition to identify, and categorize objects.
For example, if you see the letter a, your brain automatically recognizes the lines and shapes to then identify it as the letter a. The same is applied by the letters ap, your brain automatically attempts to predict the future by coming up with potential words such as app or apple. Other patterns may include numbers, road signs, or identifying a loved one in a crowded airport.
You can think of the interconnections in a deep learning system to be similar to how the human brain operates with the connection of neurons and synapses.
Deep learning is ultimately the term given to machine learning architectures that join many multilayer perceptron’s together, so that there isn’t just one hidden layer but many hidden layers. The “deeper” that the deep neural network is, the more sophisticated patterns the network can learn.
Fully connected networks can be combined with other machine learning functions to create different deep learning architectures.
How Google Uses Deep Learning
Google spiders the world’s websites by following hyperlinks (think neurons) that connect websites to one another. This was the original methodology that Google used from day one, and is still in use. Once websites are indexed various types of AI are used to analyze this treasure trove of data.
Google’s system labels the webpages according to various internal metrics, with only minor human input or intervention. An example of an intervention would be the manual removal of a specific URL due to a DMCA Removal Request.
Google engineers are renowned for frustrating attendees at SEO conferences, and this is because Google executives can never properly articulate how Google operates. When questions are asked about why certain websites fail to rank, it’s almost always the same poorly articulated response. The response is so frequent that often attendees preemptively state that they have committed to creating good content for months or even years on end with no positive results.
Predictably, website owners are instructed to focus on building valuable content – An important component, but far from being comprehensive.
This lack of answer is because the executives are incapable of properly answering the question. Google’s algorithm operates in a black box. There’s input, and then output – and that is how deep learning works.
Let’s now return to a ranking penalty that is negatively impacting millions of websites often without the knowledge of the website owner.
Google is not often transparent, PageSpeed Insights is the exception. Websites that fail this speed test will be sent into a penalty box for loading slowly – Especially if mobile users are impacted.
What is suspected is that at some point in the process there’s a decision tree that parses fast websites, versus slow loading (PageSpeed Insights failed) websites. A decision tree is essentially an algorithmic approach which splits the dataset into individual data points based on different criteria. The criteria may be to negatively influence how high a page ranks for mobile versus desktop users.
Hypothetically a penalty could be applied to the natural ranking score. For example, a website that without penalty would rank at #5 may have a -20, -50, or some other unknown variable that will reduce the rank to #25, #55, or another number as selected by the AI.
In the future we may see the end of the PageSpeed Insights, when Google becomes more confident in their AI. This current intervention on speed by Google is dangerous as it may potentially eliminate results that would have been optimal, and it discriminates against the less tech savvy.
It’s a big request to demand that everyone who runs a small business to have the expertise to successfully diagnose and remedy speed test issues. One simple solution would be for Google to simply release a speed optimization plug-in for wordpress users, as wordpress powers 43% of the internet.
Unfortunately, all SEO efforts are in vain if a website fails to pass Google’s PageSpeed Insights. The stakes are nothing less than a website vanishing from Google.
How to pass this test is an article for another time but at a minimum you should verify if your website passes.
Another important technical metric to worry about is a security protocol called SSL (Secure Sockets Layer). This changes the URL of a domain from http to https, and ensure the secure transmission of data. Any website that does not have SSL enabled will be penalized. While there are some exceptions to this rule, ecommerce and financial websites will be most heavily impacted.
Low cost webhosts charge an annual fee for SSL implementation, meanwhile good webhosts such as Siteground issue SSL certificates for free and automatically integrate them.
Another important element on the website is the Meta Title and Meta description. These content fields have an outsized order of importance that may contribute as much to the success or failure of a page as the entire content of that page.
This is because Google has a high probability of selecting the Meta Title and Meta description to showcase in the search results. And this is why it is important to fill out the meta title and meta description field as carefully as possible.
The alternative is Google may choose to ignore the meta title and meta description to instead auto-generate data that it predicts will result in more clicks. If Google predicts poorly what title to auto-generate, this will contribute to less click-throughs by searchers and consequently this contributes to lost search engine rankings.
If Google believes the included meta description is optimized to receive clicks it will showcase it in the search results. Failing this Google grabs a random chunk of text from the website. Often Google selects the best text on the page, the problem is this is the lottery system and Google is consistently bad at choosing what description to select.
Of course if you believe the content on your page is really good, sometimes it makes sense to allow Google to pick the optimized meta description that best matches the user query. We will opt for no meta description for this article as it is content rich, and Google is likely to select a good description.
In the meantime, billions of humans are clicking on the best search results – This is the human-in-the-loop, Google’s last feedback mechanism – And this is where reinforcement learning kicks in.
What is Reinforcement Learning?
Reinforcement learning is a machine learning technique that involves training an AI agent through the repetition of actions and associated rewards. A reinforcement learning agent experiments in an environment, taking actions and being rewarded when the correct actions are taken. Over time, the agent learns to take the actions that will maximize its reward.
The reward could be based on a simple computation that calculates the amount of time spent on a recommended page.
If you combine this methodology with a Human-in-the-loop sub-routine this would sound awfully a lot like existing recommender engines that control all aspects of our digital lives such as YouTube, Netflix, Amazon Prime – And if it sounds like how a search engine should operate you are correct.
How Google Uses Reinforcement Learning
The Google flywheel improves with each search, humans train the AI by selecting the best result that best answers their query, and the similar query of millions of other users.
The reinforcing learning agent continuously works on self-improving by reinforcing only the most positive interactions between search and delivered search result.
Google measures the amount of time it takes for a user to scan the results page, the URL they click on, and they measure the amount of time spent on the visited website, and they register the return click. This data is then compiled and compared for every website that offers a similar data match, or user experience.
A website with a low retention rate (time spent on site), is then fed by the reinforcement learning system a negative value, and other competing websites are tested to improve the offered rankings. Google is unbiased, assuming there’s no manual intervention, Google eventually provides the desired search results page.
Users are the human-in-the-loop providing Google with free data and become the final component of the deep reinforcement learning system. In exchange for this service, Google offers the end user an opportunity to click on an ad.
The ads outside of generating revenue serve as a secondary ranking factor, floating more data about what makes a user want to click.
Google essentially learns what a user wants. This can be loosely compared to a recommender engine by a video streaming service. In that case a recommender engine would feed a user content that is targeted towards their interests. For example, a user who habitually enjoys a stream of romantic comedies might enjoy some parodies if they share the same comedians.
How Does this Help SEO?
If we continue with computational thinking we can assume that Google has trained itself to deliver the best results, and this is often achieved by generalizing and satisfying human biases. It would in fact be impossible for Google’s AI to not optimize results that cater to these biases, if it did the results would be sub-optimal.
In other words there is no magic formula, but there are some best practices.
It is the responsibility of the SEO practitioner to recognize the biases that Google seeks that are specific to their industry – And to feed into those biases. For example, someone searching for election poll results without specifying a date, are most likely searching for the most recent results – this is a recency bias. Someone searching for a recipe, most likely does not need the most recent page, and may in fact prefer a recipe that has withstood the test of time.
It is the responsibility of the SEO practitioner to offer visitors the results they are looking for. This is the most sustainable way of ranking in Google.
Website owners must abandon targeting a specific keyword with the expectation that they can deliver whatever they want to the end user. The search result must precisely match the need of the user.
What is a bias? It could be having a domain name that looks high authority, in other words does the domain name match the market you are serving? Having a domain name with the word India in it may discourage USA users from clicking on the URL, due to a nationalism bias of trusting results that originate from a user’s country of residence. Having a one word domain may also give the illusion of authority.
The most important bias is what does a user want to match their search query? Is it an FAQ, a top 10 list, a blog post? This needs to be answered, and the answer is easy to find. You just need to analyze the competition by performing a Google search in your target market.
Black Hat SEO is Dead
Compare this to Black Hat SEO, an aggressive method of ranking websites that exploits devious SPAM techniques, including buying backlinks, falsifying backlinks, hacking websites, auto generating social bookmarks at scale, and other dark methodologies that are applied via a network of black hat tools.
Tools that are often repurposed and resold on various search engine marketing forums, products with next to no value and few odds of succeeding. At the moment these tools enable the sellers to become wealthy while they offer minimal value to the end user.
This is why I recommend abandoning Black Hat. Focus your SEO on viewing it from the lens of machine learning. It’s important to understand that every time someone skips a search result to click on a result buried underneath, it’s the human-in-the-loop collaborating with the deep reinforcement learning system. The human is assisting the AI with self-improving, becoming infinitely better as time progresses.
This is a machine learning algorithm that has been trained by more users than any other system in human history.
Google handles 3.8 million searches per minute on average across the globe. That comes out to 228 million searches per hour, 5.6 billion searches per day. That is a lot of data, and this is why it is foolish to attempt black hat SEO. Assuming Google’s AI is going to remain stagnant is foolish, the system is using the Law of Accelerating Returns to exponentially self-improve.
Google’s AI is becoming so powerful that it is conceivable that it could eventually become the first AI to reach Artificial General Intelligence (AGI). An AGI is an intelligence that is able to use transfer learning to master one field to then apply that learned intelligence across multiple domains. While it may be interesting to explore Google’s future AGI efforts, it should be understood that once the process is in motion it is difficult to stop. This is of course speculating towards the future as Google is currently a type of narrow AI, but that is a topic for another article.
Knowing this spending one second more on black hat is a fool’s errand.
White Hat SEO
If we accept that Google’s AI will continuously self-improve, then we have no choice but to give up on attempting to outsmart Google. Instead, focus on optimizing a website to optimally provide Google specifically what it is looking for.
As described this involves enabling SSL, optimizing page loading speed, and to optimize the Meta Title and Meta Description. To optimize these fields, the Meta Title and Meta Description must be compared to competing websites – Identify the winning elements that result in a high click through rate.
If you optimized being clicked on, the next milestone is creating the best landing page. The goal is a landing page that optimizes user value so much that the average time spent on page outperforms similar competitors who are vying for the top search engine results.
Only by offering the best user experience can a webpage increase in ranking.
So far we have identified these metrics to be the most important:
- Loading Speed
- SSL Enabled
- Meta Title and Meta Description
- Landing Page
The landing page is the most difficult element as you are competing against the world. The landing page must load quickly, and must serve everything that is expected, and then surprise the user with more.
It would be easy to fill another 2000 words describing other AI technologies that Google uses, as well as to dig deep further into the rabbit hole of SEO. The intention here is to refocus attention on the most important metrics.
SEO partitioners are so focused on gaming the system that they forget that at the end of the day, the most important element of SEO is giving users as much value as possible.
One way to achieve this is by never allowing important content to grow stale. If in a month I think of an important contribution, it will be added to this article. Google can then identify how fresh the content is, matched with the history of the page delivering value.
If you are still worried about acquiring backlinks, the solution is simple. Respect your visitors time and give them value. The backlinks will come naturally, as users will find value in sharing your content.
The question then shifts to the website owner on how to provide the best user value and user experience.