OVERVIEW

Understanding Google’s RankBrain

Why it matters

2015 marked a turning point in Google’s search engine history— for the first time, the company would be relying on artificial intelligence to interpret and handle search queries. While the common user may not have noticed anything out of the ordinary, technical marketers recognized the widespread impact it would have on content and search optimization in the years to come.

Defining RankBrain

RankBrain uses artificial intelligence to interpret never-before-seen search queries. The AI was rolled out in the spring of 2015, announced fall of the same year. Initially, RankBrain only impacted about 15% of all searches, but it now affects all results.

RankBrain expands on the Hummingbird Update, in that it advances the search engine system from analyzing literal strings of characters and words into analyzing the overall subject matter or intent behind the queries. For every word or phrase that the search engine cannot recognize, RankBrain converts the language into vectors, mathematical entities that help the machine determine meaning based on contextual information and similar language. 

A Shift in Search Engine Approach

Google and Machine Learning

Google’s research in machine learning and artificial intelligence dates back as early as 2011 when the company assembled a team of researchers for the Google Brain project. The idea was to use deep learning techniques to tackle the challenge of building artificial intelligence, as well as deploying a large-scale machine learning system on top of Google’s cloud computing infrastructure. One of the team’s first projects was to teach Google Brain to identify a cat using a neural network of 16,000 computers.

Over time, Google Brain expanded and applied machine learning to other projects. In 2016, the team developed an experiment involving two AIs tasked with creating their own cryptographic algorithm and encrypting their communication from a third AI tasked with decoding the message. The project was successful and set a precedent for AI-based message encryption.

Google Brain has also been used for enhancing low-resolution images, Google Translate, speech recognition, algorithmic recommendations, and even robotics. The company’s research has proven so successful that in 2017, CEO Sundar Pichai announced the formation of a new division, Google AI, focused solely on artificial intelligence projects. 

Hummingbird

Artificial intelligence has been integral to the growth of Google’s search engine. Before RankBrain’s introduction, the original algorithm was updated to “Hummingbird”, a major change that focused on speed and accuracy. The Hummingbird update centered around more natural language queries— placing greater importance on context and intent over keyword density. 

The goal— to “reward” more human search interactions and conversations rather than search-optimized language. Long-tail keywords (A long-phrase or sentence that a person might say to a voice assistant like Siri or Alexa) started to take precedence over simple keywords. That same strategy has carried over with RankBrain.

Post-RankBrain

In the five years since its deployment, RankBrain has become the third-most important signal contributing to a search result, next to content and links. Although invisible to the average user, the Google search engine has become more effective at interpreting the overall intent behind previously unseen phrases or sentences. For example, stop words (such as “the”, “and”) have been typically ignored by the search engine, but are now taken into account if it is important to the query

RankBrain has also influenced how marketers attempt to rank for certain queries. Instead of trying to game the system through specific signals (for example, keyword stuffing), marketers must focus on creating and delivering high-quality content from a human perspective.

Google’s decision to use RankBrain will allow the search engine to not only return more relevant, high-quality reader-friendly content, but it will also serve as a more reliable evaluation tool for new, emerging queries.

Functionality

Traditional search engines operate using an algorithm, a complex equation that takes into account numerous qualities such as domain authority, number of links pointing to and from the page, keyword density, and many other factors. In the case of Google’s original algorithm, PageRank, Google analyzed the interconnectedness of pages, with each link to a particular page counting as a “vote”. However, some votes would be viewed as more important than others, for example, if it were a webpage owned by a government or educational institution. This system created a more accurate measurement of a search result’s relevance, trustworthiness, and quality.

With the introduction of the RankBrain AI, Google’s search engine capability is enhanced through the power of machine learning. RankBrain’s main focus is on previously unseen queries, and serving up the most relevant information using historical search data (even data that is only a few months old), as opposed to relying solely on landing page data. A good example of this is in application to local or world news. While typing in “pandemic” may have previously resulted in definitions or past case studies, earlier in 2020, RankBrain learned to identify searchers’ intent to understand more about the novel coronavirus. 

The AI has become particularly adept at interpreting the relationships between certain keywords and phrases within specific contexts. One approved patent by Google lends insight into how RankBrain works— rather than focusing on the keywords or phrases themselves, Google uses a substitution system that looks at overall context and “concepts” to provide more accurate answers. For example, the term “New York Times Puzzle” can be a collection of multiple concepts, “Puzzle”, “New York”, and “New York Times”. Google uses a substitution engine to check whether any of the terms in the query can be substituted with similar concepts. Substitute words are essentially Google’s method of using synonyms, but with much larger concepts and more accurate results. 

Patent US9104750B1 showing an example system that can revise search queries using substitute terms.

Technical Description

Back in 2012, Amit Singhal, Google’s SVP of Engineering penned a post titled, Introducing the Knowledge Graph: things, not strings. Besides unveiling the knowledge graph feature, the post highlighted Google’s new strategy in search— focusing on entire concepts or topics as opposed to exact keywords and phrases. Today, typing in a broad term like “nike” will present a panel with various information such as stock price, customer service numbers, location, key figures, and more. The panel will also adjust depending on the information requested, providing scores for sports teams, screening times and reviews for films, and more. 

Google’s video explaining the Knowledge Graph.

That same concept of “things, not strings” has clearly influenced the development of RankBrain. Rather than parsing traditional keywords, Google evaluates the best results by observing what they refer to as “entities”. 

Understanding entities poses a challenge, as the documentation is limited and usually relegated to patents and analyses from various marketers. A patent filed by Google back in 2012, titled “Ranking search results based on entity metrics” offers one of the earliest explanations for how entities work. Based on the patent, entities are ranked on search according to four criteria:

  1. Relatedness – The relationship between two different entities on the web, such as “Nike” and “sneakers”. It can also apply to the relationship between an entity and several other entities, such as “sneakers” and “Nike”, “Adidas”, “Puma” etc. Each of these brands is its own entity, but Google is able to see the big picture and present all of them in relation to “sneakers”.
  2. Notability – The measure of importance or fame relative to the entity’s umbrella topic. In the patent, Google mentions that the greater value an entity is (more trustworthy backlinks, reviews, or mentions), and the lower the value of the category it’s in, the greater the notability. In other words, how well-known is the entity relative to its topic or niche? The smaller the niche, the higher the notability. 
  3. Contribution –  The measure of the entity’s contribution to the topic. This includes any links, reviews, media, or any other content that brings something to the overall discussion. A website dedicated to product guides and reviews will have a greater entity contribution than someone posting a single review on their Facebook page. 
  4. Prizes – The quantity or quality of relevant prizes an entity has received. Whether it’s a Webby or a Grammy, awards assign value and trust to a particular entity relative to similar entities. 

For every query, Google assigns a value for each criterion, then adjusts the weight of each score depending on the type of query. The resulting score then helps Google determine which results are most relevant, particularly for never-seen-before words or phrases

Another patent, titled “Question answering using entity references in unstructured data” offers insights on how entities differ from traditional keywords. According to this patent, each entity is given a unique identifier. So instead of the machine reading “Arnold Schwarzenegger” as a string, it is referred to as “9202a8c04000641f8000000000006567”.  This allows Google to interpret the query as an entity, and its relationships to various other topics, as opposed to just “reading” it.

Google’s Patent depicting entity references associated with a search query

Similar to PageRank, entities are ranked by a score determined by freshness (how new it is), previous selection by users, incoming and outgoing links. The most common entities are logged in Google’s database, saving the system time from processing the top search results for each query. 

So, for example, here is how RankBrain works in its most simple form:

  1. Google receives a query for something it’s never seen before (like a new movie title or a phrase connecting two topics, like “which country has the best cars”)
  2. Google assigns the entity a unique identifier, like 9202a8c04000641f8000000000006567.
  3. Google determines the entity’s relatedness to other entities, then assigns it a value.
  4. Google determines the entity’s notability, then assigns it a value. 
  5. Google determines the entity’s contribution, then assigns it a value. 
  6. Google evaluates any awards the entity received, then assigns a value. 
  7. Each value is weighted according to the entity’s query type. For example, in the case of the best car example, Google may prioritize relatedness and awards for particular brands, and return the result as a carousel of options rather than a single webpage. 

The beauty of RankBrain is not in its speed and accuracy, but it’s autonomy in handling novel queries. Prior to RankBrain, 15% of queries were never seen before, and Google had trouble returning relevant results. With RankBrain, Google is able to apply a more efficient process while also developing a framework for continual learning of entities. 

Ramifications for Businesses and Digital Marketers

RankBrain, as well as Google’s other projects in AI and machine learning, will continue to be a driving force in Google’s search engine for the foreseeable future. Marketers should adjust their content and search marketing strategies accordingly to have a greater likelihood of ranking highly on SERPs.

Keywords

First, the nature of keyword usage in online content has changed. Pre-RankBrain, keywords were repeatedly used, often in unnatural or robotic writing styles, to artificially drive the score of a page. Certain methods such as hiding additional text or keyword stuffing became penalized over time, but that did not stop content writers from forcing keywords into landing pages and blog posts. 

While keywords remain an essential aspect of SEO strategy, writers no longer need to use exact match keywords in their content. Instead, it is more beneficial to write in a natural, reader-friendly style. RankBrain doesn’t change these rules, originally implemented in the Hummingbird update. As Google’s Gary Ilyes puts it: 

“It’s a new signal. [You don’t optimize] for RankBrain … It’s about making sure the user gets the result that is deserved for the query. If you write in natural language, you’re all set. If you keyword-stuff your content, that will almost certainly not be good for you.”

Tags and meta-descriptions

Metadata continues to play an indirect, yet important role in SEO. Optimizing title tags with the right keywords may provide marginal SEO benefit, but more importantly, title tags can determine the clickthrough rate, which in turn has a greater impact on search engine rankings. Tags should continue to be informative and relevant to the webpage content. 

The same applies to meta-descriptions and alt-text. Adding keywords for these elements has been shown to have positive SEO effects, but it also helps readers better understand content relevant to their query and situation. Content writers must continue to use meta descriptions to add relevant context around their webpages and articles and use alt text to help visually-impaired searchers (and crawlers) better understand the content they discover, and determine whether it is right for them.

User Experience

Back in 2019, a Reddit user asked Gary Ilyes, “Lots of people keep saying that part of the RB system includes UX signals, including Dwell Time, Bounce Rate, Click Through Rate, etc … Can you please confirm/deny whether RB uses UX signals of any kind?” 

In response, Ilyes wrote, “Dwell time, CTR, whatever [Moz founder Rand Fishkin’s] new theory is, those are generally made up…”

While most people take Ilyes’s comment at face value (even Google’s Matt Cutts made a similar comment back in 2008), other marketers argue that his reply is specifically in regards to RankBrain. Although RankBrain does not consider these elements, it nevertheless has been speculated by marketers to be equally important to the search algorithm, due to Google’s tendency to track dwell time anyway and reward high CTR. In essence, regardless of whether RankBrain uses these signals, it’s worth it for businesses and marketers to optimize content for long dwell times, low bounce rates, and high CTR anyway. 

RankBrain’s impact on search and content marketing will continue to be the subject of debate and research for at least the next decade. As Google’s user base increases, there will be undoubtedly a rise in queries that have no precedent in terms of meaning or context. With RankBrain, Google will be able to more accurately identify complex correlations between vast entities, and deliver more useful insights, even with a lack of information. 

But there is still much to learn about the inner machinations of the RankBrain AI. How does RankBrain teach itself? How may RankBrain change and grow in the next five years? How might people search differently as a result of RankBrain? And what greater plans does Google have for RankBrain in the future? Only time and analysis will tell.