How Google Discover qualifies, ranks, and filters content: Research

Understanding the Black Box of Google Discover

For many digital publishers and SEO professionals, Google Discover remains one of the most mysterious and volatile sources of organic traffic. Unlike traditional search, where intent is driven by specific user queries, Discover is a proactive, “query-less” feed that anticipates what a user wants to see before they even ask for it. While it can drive millions of sessions in a matter of hours, it is also notorious for sudden traffic drops and unpredictable behavior.

Recent SDK-level research by Metehan Yesilyurt has provided a rare look behind the curtain, revealing the architectural framework that governs how Google Discover qualifies, ranks, and filters content. By analyzing the observable signals within the Google Discover app framework and telemetry data, we can now map out a sophisticated nine-stage pipeline that determines the lifecycle of every piece of content on the platform.

This research highlights that Discover is not just a simplified version of Google Search; it is a complex ecosystem driven by predictive modeling, real-time feedback loops, and strict technical eligibility requirements that can disqualify a publisher before the ranking process even begins.

The Nine-Stage Content Pipeline

The journey from a published article to a user’s Discover feed involves a structured, multi-stage pipeline. Understanding where your content sits in this flow is essential for diagnosing visibility issues.

1. Crawling and Semantic Understanding

Before anything else, Google must discover the URL. Through its standard crawling mechanisms, Google parses the page to understand its core subject matter. This involves more than just reading keywords; the system uses advanced natural language processing to categorize the content into specific interest clusters.

2. Metadata Extraction

The system looks for specific signals that define how the content will be presented. The research indicates that Discover relies heavily on Open Graph (OG) tags. It reads the title, the primary image, and the description to build the “card” that users see.

3. Content Classification

Once understood, the content is categorized. Is this a piece of breaking news, a trending topic, or a “long-tail” evergreen guide? This classification is critical because it determines which “decay” model will be applied to the content’s visibility over time.

4. The Qualification and Block Check

This is a “hard” filter stage. Before the system even considers whether a user might like your content, it checks for publisher-level blocks. If a user has previously selected “Don’t show content from [Site Name],” the content is instantly discarded from that user’s potential pool.

5. Interest Matching

Google compares the content’s semantic clusters against the user’s documented interests. These interests are derived from Search history, YouTube activity, and previous interactions within the Discover feed itself.

6. Predicted Click-Through Rate (pCTR) Modeling

This is the heart of the ranking engine. Using a server-side model, Google estimates the likelihood of a user clicking on a specific card. This isn’t just based on the user’s past behavior, but on how similar users have interacted with the same content and how the publisher has performed historically.

7. Feed Layout Construction

The system decides how to arrange the cards. It considers variety, ensuring the user isn’t overwhelmed by a single topic, and determines whether to use a large image format or a smaller thumbnail layout based on the available assets.

8. Content Delivery

The content is pushed to the user’s device. Interestingly, the research found that this feed is dynamic; it can be updated in real-time while a user is actively scrolling without requiring a manual refresh.

9. Feedback Recording

Every interaction—or lack thereof—is recorded. If a user clicks, scrolls past, dismisses, or reports a card, that data is sent back to Google’s servers to refine the pCTR model for the next session.

The Critical Role of Publisher-Level Blocks

One of the most significant findings in recent research is the power of the publisher block. In the Discover interface, users have the option to suppress content from an entire domain. According to the SDK analysis, this block occurs very early in the pipeline.

Unlike Google Search, where a site might rank lower but still appear for specific queries, a block in Discover is binary. If a user blocks a publisher, that domain effectively ceases to exist for that user’s feed. Furthermore, the research notes that there is no equivalent “sitewide boost” mechanism. While you can be suppressed instantly at a domain level, you must earn your way into every single user’s feed through individual interest matching and engagement.

This asymmetry makes brand trust and user experience paramount. If a publisher relies on “clickbait” titles that lead to low-quality content, they risk a permanent exclusion from a user’s ecosystem that no amount of SEO optimization can fix.

The Predicted Click-Through Rate (pCTR) Model

Ranking in Discover is largely governed by a pCTR model. Because Discover is a visual medium, Google’s servers must predict engagement before the content is even shown. While the exact weights of the model are proprietary, the research identified several key signals that the app sends to Google to inform these decisions:

Title and Meta Tag Integrity

The system primarily looks for the “og:title” tag. If this is missing, it cascades to secondary options like the “twitter:title” or the standard HTML title tag. A clear, compelling title that accurately reflects the content is essential for a high pCTR.

Image Quality and Dimensions

Visuals are perhaps the most important factor for Discover success. To qualify for large, high-engagement cards, images must be at least 1200px wide. The research confirmed that smaller images are often relegated to thumbnail views, which naturally receive lower click-through rates and, consequently, lower priority in the ranking model.

Historical Performance Data

The model considers the past click and impression data for the specific URL. If a piece of content starts strong and maintains a high engagement rate, the system will continue to “push” it to wider audiences. Conversely, if initial engagement is low, the content is quickly cycled out.

The Science of Content Freshness and Decay

Google Discover is heavily biased toward “the now,” but it handles evergreen content through a separate classification system. The research identified specific time windows that impact visibility:

1 to 7 Days: The Peak Performance Window

New content receives the strongest algorithmic boost. Most news-oriented content lives and dies within this seven-day window. This is where the highest volume of traffic typically occurs.

8 to 14 Days: Moderate Visibility

After the first week, content begins to see a natural decay in its “freshness score.” It may still appear to users with very high interest in the topic, but its overall reach starts to contract.

15 to 30 Days: Limited Reach

By this stage, only exceptionally high-performing or highly niche content remains in the general feed.

30+ Days: Gradual Decline or Evergreen Status

While most content disappears after 30 days, Google has a separate classification for evergreen content. If a piece of content is deemed “timeless” and continues to receive steady engagement, it can bypass the standard decay model and resurface months or even years later when relevant to a user’s current interests.

Technical Disqualifiers: The Meta Tags That Can Kill Traffic

While most SEOs focus on what to include, this research highlights what to avoid. There are specific meta tags that can act as “kill switches” for Google Discover visibility.

The “nopagereadaloud” and “notranslate” meta tags are particularly dangerous for Discover optimization. If these tags are present, the research suggests the system may exclude the page from the Discover pipeline entirely. This is likely because Discover aims to provide a versatile, accessible experience, and content that restricts these features doesn’t fit the platform’s delivery goals.

Additionally, the failure of images to load successfully is a major red flag. If Google’s crawler or the Discover app itself cannot successfully render the primary image asset, the card will not be generated. In a visual-first feed, “no image” means “no entry.”

The Personalization Layers

Google Discover is one of the most personalized surfaces in the Google ecosystem. This personalization happens across multiple layers:

Interest-Based Targeting

Google uses a user’s “Topic Graph”—a collection of interests ranging from broad (e.g., “Technology”) to hyper-specific (e.g., “Nvidia RTX 5090 leaks”). Content is mapped to these graphs with high precision.

Direct User Actions

The system prioritizes content from publishers that a user has explicitly “followed” or whose content they have “saved.” Conversely, “dismissals”—where a user swipes a card away or selects “Not interested”—are treated as permanent feedback for that specific URL. Once a user dismisses a story, it is stored on the server side and will not be shown to that user again.

Engagement Metrics

The research points to “time spent” as a significant signal. If users click on a card but immediately bounce back to the feed, the system perceives this as a mismatch between the card’s promise (the title/image) and the content’s value. High-dwell time signals to the algorithm that the content is fulfilling the user’s curiosity.

A Constant State of Experimentation

One of the most revealing aspects of the SDK-level research is the sheer scale of experimentation occurring within the app. During a single observed session, researchers found approximately 150 server-side experiments running simultaneously, along with over 50 feature controls.

These experiments can affect anything from card layout and font size to the weight given to certain ranking signals. This explains why two users with identical interests might see completely different feeds, or why a publisher might see a sudden surge or dip in traffic without any changes to their content strategy. Volatility is a built-in feature of Google Discover, not a bug. The system is constantly testing and iterating to maximize user retention and engagement.

Practical Strategies for Publishers

Based on these architectural insights, publishers can move away from guesswork and toward a more technical approach to Discover optimization.

Optimize for Large Assets

Ensure every article has a high-quality, relevant image that is at least 1200px wide. Use the “max-image-preview:large” setting to signal to Google that you want your content to occupy the most prominent card layouts.

Focus on First-Touch Engagement

Since pCTR is a primary ranking factor, your titles must be compelling without being deceptive. If the “og:title” doesn’t create immediate interest, the content will fail to gain the initial momentum required to survive the first 24 hours in the feed.

Monitor Technical Health

Regularly audit your site for meta tags that could interfere with Discover. Ensure that your images are not just large, but also optimized for fast loading and served via a reliable CDN. If an image fails to load during the “Build Layout” stage, your content is effectively invisible.

Build Brand Affinity

Because publisher-level blocks are so powerful and permanent, focusing on high-quality, trustworthy reporting is a long-term SEO strategy. Encouraging users to “Follow” your brand in the Google app can provide a significant boost that bypasses some of the volatility of the general interest-matching algorithm.

Understand the Lifecycle

Accept that Discover traffic is front-loaded. Plan your content calendar around the 1-7 day freshness window for news, but maintain a secondary strategy for evergreen content that can provide long-tail Discover hits months after publication.

The Future of Discovery

As Google continues to integrate more AI-driven features and personalized clusters, the Discover feed will likely become even more specialized. The research into its architecture reveals a system that is increasingly self-correcting—relying on real-time telemetry and massive experimentation to stay ahead of user needs.

For publishers, the message is clear: success in Google Discover is a combination of technical eligibility, visual excellence, and a deep alignment with user interests. By respecting the rules of the pipeline—from the 1200px image requirement to the avoidance of negative meta tags—sites can position themselves to catch the “Discover wave” and sustain it through high-quality engagement.