How Google Discover qualifies, ranks, and filters content: Research

Google Discover has become one of the most significant yet enigmatic traffic drivers for modern publishers. Unlike traditional search, which relies on user queries, Discover is a highly personalized feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many digital media outlets, a single article “going viral” on Discover can result in hundreds of thousands of visits in a matter of hours. However, this traffic is notoriously volatile and unpredictable.

Recent SDK-level research conducted by Metehan Yesilyurt has pulled back the curtain on the internal architecture of Google Discover. By analyzing the observable signals within the Google app framework, this research maps out the multi-stage pipeline that dictates how content is qualified, filtered, and eventually ranked for individual users. Understanding this pipeline is essential for SEOs and content creators who want to move beyond guesswork and align their strategies with the technical realities of Google’s recommendation engine.

The Nine-Stage Google Discover Pipeline

The research identifies a structured, nine-stage flow that every piece of content must navigate before appearing in a user’s feed. This process is highly automated and relies on a combination of real-time classifiers and server-side models. The journey of a URL through Discover looks like this:

1. Crawling and Content Understanding

The process begins with Google’s ability to find and index your content. This is not fundamentally different from standard search indexing, but for Discover, the speed of crawling is paramount. Google must understand the core topic and entities within the article almost immediately after publication to determine if it meets the criteria for “fresh” content.

2. Meta Tag Extraction

Once the content is crawled, the system extracts critical metadata. This stage focuses heavily on Open Graph tags (og:title and og:image). This information is used to build the visual “card” that the user sees. If these tags are missing or improperly formatted, the content may fail to move to the next stage.

3. Content Classification

Google classifies the content type. Is it a breaking news story, a “how-to” guide, or an evergreen piece of long-form journalism? These classifications help the system determine which “bucket” the content belongs in and how long its shelf life should be.

4. Block List Verification

This is one of the most critical stages for publishers. Before any interest matching or ranking occurs, the system checks for blocks. If a user has previously indicated they do not want to see content from your domain, your URL is filtered out immediately. There is no opportunity to “out-rank” a publisher-level block.

5. Interest Matching

The system attempts to align the content’s topic with the user’s established interests. This is based on the user’s Search history, YouTube activity, and previous interactions within the Discover feed itself.

6. Predicted Click-Through Rate (pCTR) Modeling

Google applies a sophisticated, server-side pCTR model. The system evaluates how likely a specific user is to click on your card compared to other available options. This model considers historical engagement data for your domain and the specific URL.

7. Feed Layout Construction

At this stage, the system decides how the feed will look. It selects which cards will be “large” (high-quality images) and which will be smaller thumbnails, ensuring a diverse and visually appealing mix of content.

8. Content Delivery

The content is finally pushed to the user’s device. This happens in real-time, and the feed can be updated even while the user is actively scrolling.

9. Feedback Recording

The final stage is the loop. Every action the user takes—clicking, dismissing, saving, or ignoring—is recorded and fed back into the system to refine future ranking and filtering decisions.

The Silent Killers: Why Content Fails to Qualify

One of the most striking findings of the research is the existence of “hard blocks” that prevent content from even entering the ranking competition. Many publishers focus on keywords and engagement, but technical oversights can disqualify a page before it ever reaches a user.

Two specific meta tags can act as total suppressors: “nopagereadaloud” and “notranslate”. If these tags are detected, the system may interpret the content as restricted or unsuitable for the Discover environment, leading to an automatic exclusion. While these tags have legitimate uses for accessibility or technical reasons, their presence is a red flag for the Discover pipeline.

Furthermore, image requirements are non-negotiable. Google Discover is a visual-first medium. To qualify for the large, high-engagement cards that drive the most traffic, images must be at least 1200 pixels wide. The system also requires the setting max-image-preview:large (or the use of AMP) to display these high-resolution visuals. If your images are small or fail to load correctly during the delivery stage, your visibility will be severely limited, often resulting in small thumbnail displays that suffer from significantly lower click-through rates.

The Power of the Publisher-Level Block

The research highlights a sobering reality for publishers: the “Don’t show content from this site” action is incredibly powerful. Because this block happens at the fourth stage of the pipeline—long before ranking models are applied—it acts as a permanent barrier between your domain and that specific user.

Currently, there is no equivalent “sitewide boost” mechanism. While a user can “follow” a publisher, the research suggests that a single negative action (a dismissal or a block) carries more weight in the filtering process than a single positive action. This creates a high-stakes environment where clickbait or misleading titles might drive short-term clicks but result in long-term domain suppression if users feel deceived and choose to block the source.

The Freshness Decay: Understanding the Visibility Windows

Time is the most influential factor in Discover visibility. Unlike traditional search, where a high-quality guide can remain at the top of the SERPs for years, Discover content has a distinct and rapid decay cycle. The research identifies four primary windows of visibility:

1 to 7 days: This is the “golden window.” Freshly published content receives the strongest boost and the highest likelihood of appearing in the top positions of the feed.
8 to 14 days: Visibility begins to moderate. Unless the content is seeing exceptional engagement or is tied to a developing story, it will start to be replaced by newer URLs.
15 to 30 days: Visibility becomes limited. Content in this window usually only appears for users with very niche, specific interests that haven’t been satisfied by newer content.
30+ days: A gradual decline into total inactivity. Only content specifically classified as “evergreen” by the system survives past this point, and even then, its appearance is sporadic.

This decay emphasizes the need for a consistent publishing cadence. For news and lifestyle sites, the goal is to keep a “pipeline” of fresh content moving into the first window to maintain steady traffic levels.

The Predicted Click-Through Rate (pCTR) Model

Ranking in Google Discover isn’t just about what is in your article; it’s about what the system *predicts* will happen when a user sees it. The pCTR model is a server-side evaluation that calculates the probability of engagement. While the exact weights of the model are hidden, the signals sent from the app to the server provide a clear picture of what matters:

Meta Tag Integrity

The system prioritizes og:title. If that is missing, it falls back to the Twitter title tag, and finally, the standard HTML title. Publishers who optimize their Open Graph titles specifically for engagement (without crossing into “clickbait” territory) often see better results.

Historical Performance

The model looks at past click and impression data for the specific URL. If an article starts strong, the system is more likely to show it to a broader audience. This creates a “snowball effect” common in viral Discover content.

Image Quality and Reliability

The system tracks whether images load successfully. Technical performance—specifically related to the delivery of visual assets—is a direct ranking signal. Slow-loading images or broken tags will result in the URL being deprioritized in favor of more “reliable” content.

Personalization and User Feedback Loops

Google Discover is not a static list; it is a living feed that responds to user behavior in real-time. The personalization layer is deep, integrating data from across the Google ecosystem. The research notes several ways user actions permanently alter the feed:

Individual Dismissals: If a user swipes away a specific story, that URL is stored as “dismissed” for that user. It will never reappear, regardless of how much engagement it gets from others.
Time Spent Reading: Engagement isn’t just about the click. The system monitors how much time a user spends on a page after clicking through from Discover. High bounce rates can signal to the pCTR model that the content didn’t satisfy the user’s curiosity, leading to lower reach.
Topic Affinity: By clicking on a story about a specific video game or tech brand, the user “trains” their feed to prioritize that topic in the future.

A Culture of Experimentation

One of the most revealing aspects of the research is the sheer scale of experimentation happening within the Discover framework. During the observation period, approximately 150 server-side experiments were running simultaneously. Additionally, over 50 feature controls were active, affecting everything from card layout to how different content types are blended.

This explains why Discover traffic is so volatile. Two users with nearly identical interests might see completely different feeds because they belong to different experiment groups. For publishers, this means that sudden drops in traffic may not always be a result of a Google Algorithm Update or a “penalty,” but rather a shift in how Discover is testing content delivery to a specific segment of the audience.

Practical Strategies for Discover Optimization

Based on these research findings, how should publishers adjust their strategy? The focus should shift from traditional keyword targeting to eligibility and engagement signals.

Ensure Large Image Qualification

Check your technical setup to ensure every article includes an image at least 1200px wide. Verify that your robots.txt allows Googlebot-Image to crawl these assets and that your max-image-preview:large tag is correctly implemented.

Optimize Open Graph Data

Treat your og:title and og:description as your primary “ad copy.” These are the elements that drive the pCTR model. They should be compelling and accurate, focusing on the “curiosity gap” without being misleading.

Avoid Content Blocks

Audit your site for the nopagereadaloud and notranslate tags unless they are strictly necessary. More importantly, focus on building trust with your audience to minimize “Don’t show content from this site” actions, which are the most damaging long-term signals for a domain.

Maintain a High-Frequency Freshness Cycle

Since the strongest boost occurs in the first seven days, a consistent publishing schedule is vital. For evergreen content, consider “refreshing” and republishing older high-performing guides to bring them back into the initial visibility window—provided the updates are substantial and provide new value to the reader.

Monitor Core Web Vitals and Image Delivery

Because the Discover pipeline checks for successful image loading and delivery before final ranking, technical performance is paramount. A fast, responsive site ensures that when the “delivery” stage of the pipeline occurs, your content is ready to be shown without errors.

The Future of Discovery-Based Traffic

The research into Google Discover’s architecture confirms that it is a system built on a foundation of eligibility and trust. While traditional SEO focuses on intent, Discover focuses on affinity. By understanding the nine-stage pipeline—from the initial crawl to the final feedback loop—publishers can better position their content to survive the filters and excel in the ranking models.

Success in Discover is not about “tricks” or “hacks.” It is about meeting strict technical requirements, providing high-quality visual assets, and creating content that users genuinely want to engage with. In an environment defined by heavy experimentation and permanent user blocks, the most sustainable strategy is one that prioritizes the user experience and remains technically flawless.