How Google Discover qualifies, ranks, and filters content: Research

Understanding the Mechanics of Google Discover

For many digital publishers and SEO professionals, Google Discover remains one of the most significant yet unpredictable sources of organic traffic. Unlike traditional search, which relies on active user queries, Discover is a highly personalized “query-less” feed that anticipates user needs based on their interests and past behavior. However, the exact mechanics of how content qualifies for this feed, how it is ranked, and why it is sometimes filtered out have largely been a matter of speculation—until now.

Recent SDK-level research conducted by Metehan Yesilyurt has provided a rare look under the hood of the Google Discover architecture. By analyzing the observable signals within the Google app framework, the research reveals a complex, nine-stage pipeline that dictates the lifecycle of a piece of content within the feed. This research confirms that Discover is not just a simplified version of Google Search; it is a distinct ecosystem with its own set of rules, technical requirements, and “hard” filters that can make or break a publisher’s visibility.

The Nine-Stage Pipeline: How Content Moves Through Discover

The journey from a published article to a user’s mobile feed involves a structured sequence of events. Understanding these stages is critical for diagnosing why certain content performs well while other pieces fail to gain traction. According to the research, the pipeline follows these steps:

1. Crawling and Content Understanding

The process begins with Google’s standard crawling mechanisms. Before content can even be considered for Discover, the automated systems must index the page and understand its topical relevance. This stage leverages Google’s large-scale language models to categorize the content into specific “interest clusters.”

2. Meta Tag Extraction

Once crawled, the system specifically looks for structured data and meta tags. This is a technical checkpoint where Google identifies the primary visual and textual elements that will represent the article in the feed. This includes the Open Graph (og) tags that define titles and images.

3. Content Classification

The system then classifies the content type. Is it a breaking news story, an evergreen guide, or a localized update? This classification determines which “bucket” the content falls into and influences the “freshness” decay model that will be applied later.

4. The Publisher Block Check

In one of the most critical stages, Google checks for publisher-level blocks. If a user has previously indicated they do not want to see content from a specific domain, that content is immediately discarded from that user’s potential feed. This happens before any ranking or interest matching takes place.

5. Interest Matching

Google compares the content’s topic against the user’s individual interest profile. This profile is built from a massive array of signals, including search history, YouTube activity, and previous interactions within the Discover feed itself.

6. Predicted Click-Through Rate (pCTR) Modeling

Before the feed is rendered, Google runs a server-side prediction model. It estimates the likelihood of a user clicking on a specific card based on historical data from that URL, the publisher’s reputation, and the visual appeal of the card layout. This is where “ranking” truly begins.

7. Feed Layout Construction

The system decides how to present the qualified content. It balances different types of media, such as standard articles, YouTube videos, and “Shorts,” to create a visually diverse and engaging feed layout.

8. Content Delivery

The finalized cards are pushed to the user’s device. This delivery is often dynamic; the research indicates that the feed can update in real-time as a user scrolls, adding or reordering content without requiring a manual refresh.

9. Feedback Recording

The cycle closes with the user’s reaction. Whether a user clicks, dismisses, or follows a topic, every action is fed back into the system to refine future ranking decisions for that user and the content itself.

Technical Prerequisites: The Role of Meta Tags and Images

One of the standout findings of Yesilyurt’s research is the absolute necessity of specific page-level meta tags. Google Discover heavily relies on Open Graph (OG) tags to build the visual “cards” that users see. If these tags are missing or improperly configured, the content may be disqualified entirely.

The research identified six key tags that Google Discover prioritizes, with og:title and og:image being the most vital. If an article lacks an image tag, it simply will not appear; there is no such thing as a “text-only” card in the current Discover framework. Furthermore, Google has a hierarchy of backups. If the og:title is missing, it will attempt to use the Twitter title tag or the standard HTML title tag. However, relying on these backups can lead to suboptimal presentation in the feed.

The 1200px Rule for Large Cards

Visual prominence is a major factor in Discover success. For a piece of content to qualify for the high-engagement “large card” format, the featured image must be at least 1200 pixels wide. Images smaller than this are relegated to small thumbnail layouts. Data shows that large cards generally achieve significantly higher click-through rates, making the 1200px threshold a mandatory technical requirement for any publisher looking to maximize Discover traffic.

Tags That Can Kill Your Traffic

While some tags help you get in, others are designed to keep you out. The research highlighted two specific meta tags that act as “poison pills” for Discover: nopagereadaloud and notranslate. If these tags are detected, the system may exclude the page from the Discover pipeline entirely. Publishers should audit their CMS and SEO plugins to ensure these tags aren’t being added unintentionally, especially on mobile-optimized versions of their pages.

Ranking Factors: The pCTR Model and Historical Performance

Unlike traditional search, where “backlinks” and “keyword density” are king, Discover ranking is driven by a predicted Click-Through Rate (pCTR) model. This model is housed on Google’s servers and evaluates content based on its potential to engage the user before the user even sees it.

The pCTR model analyzes several observable signals:

Title Clarity: Is the title engaging without being “clickbaity” in a way that violates Google’s policies?
Image Quality: Does the image load correctly, and is it high-resolution?
Historical Performance: How have previous users interacted with this specific URL? High dismissal rates in the past will negatively impact future visibility for that URL.
Domain Trust: While not explicitly a “score,” the system tracks the publisher’s overall engagement history within the Discover app.

This suggests that Discover operates on a “momentum” basis. Content that starts strong and maintains a high CTR is more likely to be shown to a wider audience, whereas content that fails to engage its initial small “test” group will be quickly phased out of the pipeline.

The Impact of Freshness and Content Decay

Timeliness is a cornerstone of the Discover experience. The research mapped out how Google views the “age” of content and how it impacts visibility. While Google does surface evergreen content, there is a clear bias toward newer material.

The Lifecycle of a Discover Post

The decay of content visibility in Discover generally follows a tiered timeline:

1 to 7 Days: This is the “Golden Window.” Content in this age bracket receives the strongest ranking boost and is most likely to appear at the top of a user’s feed.
8 to 14 Days: Visibility begins to moderate. Unless the content is exceptionally high-performing or tied to a recurring trend, it will start to be replaced by newer stories.
15 to 30 Days: Content in this bracket sees limited visibility, often only appearing for users with very niche, specific interests related to the topic.
30+ Days: At this stage, most content enters a gradual decline. Only highly authoritative evergreen content remains in circulation after the one-month mark.

For news-driven sites, this reinforces the need for a consistent publishing cadence. For evergreen sites, it highlights the importance of updating “old” content with new dates and fresh information to potentially re-trigger the freshness boost within the Discover algorithm.

The Power of the Publisher Block

One of the most sobering findings of the research is the mechanics of user-level blocking. In the Discover interface, users have the option to select “Don’t show content from [Publisher Name].” The research confirms that this is a “hard” filter that occurs at the beginning of the pipeline.

Unlike search, where a user might scroll past a result they don’t like, a block in Discover is proactive and permanent across the entire domain. There is currently no equivalent “sitewide boost” mechanism that a user can trigger to ensure they always see a specific site. This puts publishers in a precarious position: one piece of polarizing or low-quality content can lead to a user blocking the entire domain, permanently cutting off that traffic source for that specific individual.

Furthermore, user dismissals (swiping a card away) are tracked and stored. If a user consistently dismisses content from a certain topic or publisher, the interest matching engine will eventually stop serving that content to them, even if the user hasn’t gone through the formal “block” process.

Experimentation and Volatility: Why Your Traffic Fluctuates

Many publishers express frustration over the extreme volatility of Discover traffic. One day an article might receive 100,000 visitors, and the next day, a similar article might receive zero. The research provides a technical explanation for this: massive, constant experimentation.

During the SDK analysis, it was observed that Google was running approximately 150 server-side experiments simultaneously. These experiments can affect everything from the size of the cards and the font used in titles to the weight given to freshness versus interest matching. Additionally, there were over 50 “feature controls” active that dictated how specific cards were displayed.

This means that two users with identical interests might see completely different versions of the Discover feed because they are in different experiment groups. For publishers, this means that some traffic drops are not the result of a “penalty” or a drop in content quality, but rather a shift in the UI/UX experiments Google is conducting at that moment.

Personalization Layers and the “Naiades” Framework

The research delves into the personalization layers that Google uses to tailor the feed. This involves a system dubbed “Naiades,” which helps manage how user feedback is processed and how content is clustered. Personalization is not just about what you search for; it’s about a holistic view of your digital footprint.

Google uses several data streams for personalization:

Direct Actions: Following a topic, saving an article, or sharing a card.
Implicit Signals: The amount of time a user spends reading an article after clicking it. If a user clicks a Discover card but immediately returns to the feed, the system views this as a “bounce” and may lower the pCTR for that article for similar users.
Cross-Platform Behavior: Activities on YouTube, Maps, and Chrome all feed into the interest matching stage of the Discover pipeline.
Publisher Center Registration: While not a direct ranking factor, being registered in the Google Publisher Center helps Google better categorize and trust the source of the content.

Practical Strategies for Optimizing for Google Discover

Based on these findings, how can publishers and SEOs improve their chances of success? The research suggests that technical compliance is just as important as content quality.

Prioritize High-Resolution Visuals

Since the 1200px image rule is a hard gate for large card qualification, ensure your CMS automatically serves high-resolution featured images. Avoid generic stock photos that users are likely to ignore or dismiss; unique, high-contrast images tend to perform better in the pCTR model.

Optimize for Open Graph Tags

Treat your og:title and og:image as the most important elements of your page for Discover. The title should be descriptive and compelling. Unlike Search titles, which often focus on keywords for ranking, Discover titles should focus on triggering curiosity and engagement. Avoid “nopagereadaloud” and “notranslate” tags unless absolutely necessary for your site’s functionality.

Monitor User Engagement, Not Just Clicks

Because Google tracks “time spent reading” as a feedback signal, it is vital that your mobile site is fast and user-friendly. If a page takes five seconds to load, a Discover user will likely bounce back to their feed, signaling to Google that your content was not a good match. High-performance mobile pages (Core Web Vitals) are indirectly essential for maintaining Discover visibility.

Understand the Freshness Window

If you have an evergreen article that performed well in the past, consider updating it with significant new information and a new “last updated” date. This can sometimes re-enter the content into the 1-7 day freshness window, giving it a second life in the Discover feed.

Build Brand Loyalty to Avoid the Hard Block

Because publisher-level blocks are so destructive, it is safer to avoid “rage-bait” or highly sensationalized content that might prompt a user to block your entire domain. Sustained success in Discover comes from being a trusted source that users are happy to see in their feed every morning.

The Future of Discovery-Based Traffic

The research into Google Discover’s architecture reveals a system that is incredibly sophisticated, highly experimental, and deeply focused on user experience. It operates on a “high-risk, high-reward” model where a single viral hit can drive more traffic than months of traditional SEO, but where a few technical errors or negative user signals can lead to a complete “blackout” of visibility.

As Google continues to integrate more AI-driven classification and real-time feedback loops, the barrier to entry for Discover will likely rise. Success will belong to those who can master the technical requirements of the nine-stage pipeline while consistently delivering content that resonates with the specific, shifting interests of their audience. In the world of Google Discover, you aren’t just competing for a keyword; you are competing for a moment of a user’s attention in a crowded and constantly evolving digital landscape.