Google admits massive document leak related to search algorithm is authentic (2024)

Thomas Barrabi

·4 min read

Google has confirmed that a massive leak of some 2,500 internal documents related to its search engine is authentic – and one expert said the trove shows that “Google tells us one thing and they do another” when it comes to its mysterious algorithms.

The tech giant has been secretive about how its search engine works even as it has wielded outsize influence over the flow of information, traffic and ad revenue online.

Some details appeared to contradict past public statements by Google employees regarding which factors are and are not used to calculate rankings.

For example, a Google Search employee said in 2016 that the company doesn’t “have a website authority score.”

The company has also explicitly denied using Chrome data in search rankings.

Information in the documents, however, suggests that Google considers click rates, data from its Chrome web browser, website size and a factor called “domain authority” – a measure of a website’s importance or relevance on a particular subject – to guide rankings.

Google admits massive document leak related to search algorithm is authentic (1)

“The main takeaway here is Google tells us one thing and they do another,” iPullRank CEO Michael King, who published the first analysis of the trove, told The Post.

“These documents give us clarity on that,” King added. “We don’t have the recipe that Google is using for search, but we now have a really clear indication of what the ingredients are.”

Some experts, including the trade publication Search Engine Land, have noted the documents mention modules that suggest Google implements “whitelists” for certain topics, including searches related to elections (IsElectionAuthority) and the COVID-19 pandemic (IsCovidLocalAuthority).

King said the references are likely Google’s attempt to identify “quality sources” on a given subject.

Details about how the whitelists may operate are scant, but Google has faced allegations of exhibiting a left-wing bias for years. A recent analysis by media company AllSides found that63% of articles on Google News were from left-leaning outlets, compared to just 6% from right-leaning sources.

An analysis by right-leaning watchdog Media Research Center detailed 41 alleged instances of “election interference” at the online search giant since 2008.

The report cited data from Dr. Robert Epstein, whoonce testified to the Senate Judiciary Committeethat “biased search resulted generated by Google’s search algorithm” shifted “at least 2.6 million votes to Hillary Clinton.”

Google admits massive document leak related to search algorithm is authentic (2)

Google has long denied it is bias against conservative viewpoints and has said Epstein’s research is “widely debunked.”

The leaked search documents allegedly contain more than 14,000 ranking factors that Google considers when organizing websites – from news outlets like The Post to small business owners and beyond.

The internal data reportedly surfaced on the online code repository GitHub in March, but it did not receive public scrutiny until search engine optimization (SEO) experts Rand Fishkin and King obtained and posted separate breakdowns.

Google tacitly confirmed that the documents are real – though it warned that they lacked important context and shouldn’t be used by the public to glean any insights about how search works.

“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated or incomplete information,” Google spokesperson Davis Thompson said in a statement.

“We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation,” the statement added.

Google admits massive document leak related to search algorithm is authentic (3)

Google also warned that the documents are not a comprehensive, relevant or up-to-date view of its Search ranking algorithm.

It’s still unclear if Google has actually implemented any of the ranking factors detailed in documents or was merely testing or experimenting with them. Some may have never been used at all.

Even if they were in use, it’s essentially impossible to assess how important they are in crafting what users see in search results.

The documents did not reveal how the ranking features are weighted.

The leaked documents provide an interesting, yet incomplete view of the company’s inner workings on search, according to Barry Schwartz, a prominent SEO expert and owner of the web consultancy RustyBrick.

Schwartz said the documents are best seen as a signal of “what Google is thinking about” as it relates to online search.

“How Google does that around certain factors like links and content quality and authority and authors – all of that’s in there,” Schwartz said. “The question is, we don’t know what they’re weighted, how important are these signals, are they used at all. That’s the issue with this.”

Nevertheless, the documents amount to “the biggest leak that we’ve ever seen come out of Google for search,” according to King.

“This is the biggest, most transparent that we’ve ever seen into how Google functions,” King said.

Google admits massive document leak related to search algorithm is authentic (2024)

FAQs

Google admits massive document leak related to search algorithm is authentic? ›

Google has confirmed that a massive leak of some 2,500 internal documents related to its search engine is authentic – and one expert said the trove shows that “Google tells us one thing and they do another” when it comes to its mysterious algorithms.

What is the algorithm for Google search leak? ›

The Google algorithm leaks (really, API leaks) revealed over 14,000 features and ranking signals used in their search engine. Confirmed (somewhat unbelievably?) by Google, these documents provide insights into various aspects of search ranking, from PageRank variants to user interaction metrics.

What was Google's response to the leak? ›

We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information. We've shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”

Does Google leak your search history? ›

Google can access your search history, especially if you're signed in to your Google account. Internet service providers can see the domain names of the websites you visit. Some apps on your phone might ask permission to access your internet browsing history. If you grant it, they'll be able to view it.

Has Google leaked 2500 pages of documents? ›

Google has indirectly acknowledged the authenticity of 2,500 leaked internal documents detailing the data it collects. The document has created ripples in the SEO and publishing industries, as they could reveal the search data that Google uses to rank pages and websites.

What does the huge Google Search document leak reveal? ›

Massive Google document leak reveals secrets of search ranking algorithms. Internal Google documents leaked on GitHub reveal secret search engine algorithms, contradicting Google's statements. The 2,500-page 'Google API Content Warehouse' document provides SEO insights shared by Rand Fishkin and analysed by experts.

How did the Google leak happen? ›

On March 13 2024, an automated bot called yoshi-code-bot released thousands of documents from Google's internal Content API Warehouse on GitHub. The leak was then shared with the co-founder of SparkToro, Rand Fishkin, and CEO of iPullRank, Michael King in an email on May 5.

What is the Google leak reveals SEO secrets? ›

The Google algorithm leak confirms what many in the SEO community suspected: Google places a high premium on content quality and relevance. The documentation included detailed metrics assessing content depth and usefulness, reflecting Google's commitment to delivering valuable search results.

When was the Google algorithm leak? ›

The documentation, which includes over 2,500 pages and over 14,000 lines of code detailing Google's search ranking mechanisms, was publicly accessible from March 27 to May 7, 2024. During this period, the data was indexed by third-party services, making it available even after Google removed it from GitHub.

What is the biggest threat Google faces today? ›

OpenAI is a bigger threat to Google than US regulators. Reuters.

Can police access your Google search history? ›

If you're charged with a crime, the police don't even need an actual warrant to get the data. Generally speaking, while they might eventually move to get a warrant, most of the time a user's search records and other data can be obtained from tech companies with nothing more than a subpoena.

Can anyone see my history after I delete it? ›

Does deleting history really delete it? No, only on the surface. Your internet provider collects and stores this information for a period that depends on data retention laws (often 6 months/1 year). The best way to protect your data is to prevent them from seeing your search history at all.

How do I stop Google from tracking my searches? ›

Turn "Do Not Track" on or off
  1. On your Android device, open Chrome .
  2. To the right of the address bar, tap More. Settings.
  3. Tap Privacy and security.
  4. Tap Send a "Do Not Track" request. Tip: If you are part of the Tracking Protection test group, follow the “Tracking Protection” instructions.
  5. Turn the setting on or off.

Who leaked Google documents? ›

Where did the docs come from? My understanding is that a bot called yoshi-code-bot leaked docs related to the Content API Warehouse on Github on March 13th, 2024. It may have appeared earlier in some other repos, but this is the one that was first discovered.

Can Google access your documents? ›

Google respects your privacy. We access your private content only when we have your permission or are required to by law.

Can Google Docs be leaked? ›

But there are many reasons that Google Docs may not be the most secure or private home for your data. Google does not use end-to-end encryption, meaning the company has access to your data, which could be exposed in a data breach.

What algorithm does Google use for search results? ›

PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page.

What is Google's original search algorithm? ›

When Google launched in 1998, it used an algorithm called PageRank. Google founders Larry Page and Sergey Brin created this software. According to Google, PageRank operated "by counting the number and quality of links to a page to determine a rough estimate of how important the website is.

What is the Google search engine ranking leak? ›

What does the leak reveal? The leak details over 14,000 attributes Google might consider when ranking a search result. This includes factors like content quality, user engagement metrics, backlinks (potentially), and the author's expertise.

Top Articles
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 5662

Rating: 4.3 / 5 (74 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.