Google's Leaked Document: Ranking Factor Breakdown

Tatiana Colligan
Jun 3, 2024
5 min read

Updated: Jul 22, 2024

A Snippet of Google's Leaked API Reference Document

A few days ago a leaked Google API Reference document got leaked which includes thousands of ranking factors, of both on-page and off-page SEO, Video, Local and more.

Now, this is not a ranking factor weight document - it does not include how each of the mentioned ranking factors compares to each other and which one is more or less important. The document does not contain any of the weights, even vague ones like the Yandex ranking factors that leaked a few years ago.

To clarify further, what the document is - is the repository of all of the APIs that are available to Google from various sources, apps and databases of its own or where it has access to. It includes but is far from being limited to:

User location
User device data
User 3d party app settings and what the user allowed or not allowed within those apps
User's calendar data and events on it
All kinds of data about videos and what is in them.
Image attributes Google has access too (location where taken, timestamp etc.)
Various ways to record, timestamp and analyze anchor text, both within the same domain and from other domains.
Originality of content (when first published)
Detailed understanding of the Comment section of blogs and social media threads.
It's own way to determine how inspirational a featured image is on a scale from 0 to 1 (What?) ... Lot's of questions here, or rather big surprise for me.
Navigational and map data from red light cameras that can tell if a red light is frequently being run through, so that users can be warned to watch out for a red light (You can imagine this is/can be used for Google Maps).

Now, let's try to bring some order to this doozy and identify which of those mentioned data points are most likely to be leveraged by the Google's ranking algorithm.

Site Speed

Google stores Core Web Vitals signals — CWV stands for "Core Web Vitals"

Google counts distinct resources that are being fetched to load a webpage. The page itself might only hold text that is a few dozen kilobytes heavy, however to get that text Google might make many "trips" to various places and sources.
It registers the total number of bytes that need to render to load a webpage. This includes images, html, javascript etc. (Time to size down that 4Mb marketing banner image on your homepage.)
Google stores Core Web Vitals signals, secure signals (probably refers to https) and mobile friendliness signals and uses most of it for ranking.

Site-Level Signals

Google collects various site-level signals that influence search rankings, which can include factors such as site structure, content quality, and user engagement metrics.

Anchor Spam and Statistics

Signals are used to identify spikes of spammy anchor phrases and penalize them. These signals help in maintaining the quality of content by demoting spammy content, which can influence search rankings.

Core Web Vitals

The document includes references to Core Web Vitals, which are critical for assessing user experience metrics such as loading performance, interactivity, and visual stability. These metrics are directly tied to user engagement and are used in ranking algorithms.

Quality Labels and Click Signals

Click and impression signals for CRAPS (a feature in Google's quality assessment) and Quality Labels are used for evaluating the user engagement and interaction with the site. This data helps in scoring and ranking the relevance and quality of a page.

And, there is such thing as a "site quality score" - when it comes to Travel Sites specifically. Although we don't know everything and the weights of what goes into it.

"Raw signals that determine the travel site quality score"

Linking

Google counts the number of pages that contain a URL in their breadcrumbs.
Google uses page anchors to enhance link structures within documents. This aids in providing more accurate search results based on content relationships and document structure.
The document also mentions: "If we receive a large number of anchors from a particular domain, then we'll throw out all but 200 of them from that domain.
Links to spammy domains bring the site quality down.

Blog and Social Data

For blog posts and microblogs, Google collects data such as author information, creation date, and outlink spamscore. This helps in determining the credibility and relevance of social content in search results.

1. BlogPerDocData

Purpose: Captures additional data for blogs and posts, enabling more nuanced analysis and indexing.
Key Attributes:
Author Information: Includes the username of the author of the microblog post, which helps identify the content creator.
Creation Date: Tracks when the content was created, useful for understanding the timeliness and relevance of the post.
Document ID (DocID): Unique identifier for each microblog post, facilitating easy retrieval and management of the content.

2. BlogPerDocDataOutlinks

Purpose: Focuses on the outlinks within blog posts and updates (microblogs), providing insights into the external connections and influences of a blog post.
Key Attributes:
Resolved URLs: Captures the actual URLs that the outlinks point to, which can be analyzed for content and context.
Site Spamscore: Measures the spamminess of the linked sites, helping to assess the quality and trustworthiness of the outlinks.

3. BlogsearchConversationNode

Purpose: Represents nodes in a conversation thread within blogs or microblogs, facilitating the analysis of discussion structures and dynamics.
Key Attributes:
Author Name: Stores the username of the author of the specific microblog post.
Child Nodes: Lists the docids of child nodes, enabling the reconstruction of conversation threads.
Parent Node: Contains the docid of the parent node, which helps in understanding the hierarchy and flow of the conversation.

4. BlogsearchConversationTree

Purpose: Represents the entire conversation as a tree structure, capturing the interconnectedness of the conversation nodes.
Key Attributes:
Conversation ID (ConvID): Unique identifier for the conversation, ensuring each conversation tree is distinct.
Nodes: Contains all the conversation nodes, which can be analyzed for engagement patterns and information dissemination.

Local Signals

The document also includes a very large variety of local signals. None of them appear to be ground shattering though or outside of most SEOs already know and do. In short - it collects data points about the user location data to serve better results, as well as collects and build out a lot of data points about the website data: including language, language around anchor text, links, and language and locality indicators in the page URLs.

The document provides a comprehensive reference for Google's API, detailing various modules, attributes, and models relevant to content management and search engine optimization. It highlights critical aspects such as site-level signals that influence search rankings, including site structure, content quality, and user engagement metrics. Key components like Core Web Vitals, spam assessment, and quality labels for evaluating user interaction are extensively covered. Additionally, the document addresses metadata handling, local search signals, and the technical nuances of sitelink scoring, offering a robust framework for understanding how different elements contribute to optimizing web presence and improving search engine performance.

Tatiana Colligan