How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Ecosystem

For many digital publishers and SEO professionals, Google Discover remains one of the most significant yet unpredictable drivers of organic traffic. Unlike traditional search, which relies on active queries, Discover is a highly personalized feed that anticipates user needs based on their interests and past behavior. However, the exact mechanics of how content surfaces in this feed have often been shrouded in mystery.

Recent SDK-level research by Metehan Yesilyurt has provided a rare, behind-the-curtain look at the internal architecture of Google Discover. By analyzing the data signals and telemetry within the Google app framework, this research maps out a sophisticated nine-stage pipeline that dictates how content is qualified, filtered, and eventually ranked. This discovery is pivotal for anyone looking to stabilize their traffic from a platform known for its extreme volatility.

The Nine-Stage Pipeline of Google Discover

Google Discover does not simply pick a random article and show it to a user. It follows a structured, multi-stage process that filters out millions of pages before a single card is rendered on a mobile device. Understanding these stages is essential for diagnosing why content may be failing to gain traction.

1. Crawling and Content Understanding: The process begins with Google’s standard crawling infrastructure. Before a page can even be considered for Discover, the system must parse the text, structure, and intent of the content. This is where Google determines what the article is actually about.

2. Meta Tag Extraction: The system specifically looks for Open Graph (OG) tags and other structured data. It prioritizes the og:title and og:image to understand how the content should be presented visually.

3. Content Classification: Content is categorized into specific buckets, such as breaking news, evergreen guides, or niche interest topics. This classification helps the system decide which “decay” model to apply to the content’s visibility.

4. The Block Check: This is a critical gatekeeper. The system checks if the publisher has been blocked by the user or if the domain has been flagged for policy violations. If a block exists, the process stops here.

5. Interest Matching: Google compares the content’s topic clusters against the user’s individual interest profile, which is built from search history, app usage, and location data.

6. pCTR Prediction Model: A server-side model predicts the Click-Through Rate (pCTR). It estimates the likelihood of a specific user clicking on a specific card based on historical performance and visual cues.

7. Feed Layout Construction: The system determines where the card will sit in the feed and whether it will be a large, high-engagement card or a smaller thumbnail.

8. Delivery: The content is pushed to the user’s device.

9. Feedback Loop: The system monitors user interaction. Did they click? Did they dismiss the card? Did they spend time reading? This data is fed back into the model for future ranking decisions.

The Pre-Ranking Filter: The Power of Publisher Blocks

One of the most significant findings in the recent research is the placement of the publisher-level block in the pipeline. In Google Discover, the decision to filter out a publisher happens before the ranking engine even considers the content’s quality or relevance. This creates a “hard wall” for certain domains.

When a user selects “Don’t show content from this site,” it isn’t just a temporary preference; it is a powerful suppression signal. Unlike traditional search, where a user might still see a site they dislike if it is the most relevant result, Discover treats a block as an absolute exclusion. There is currently no equivalent “sitewide boost” mechanism that publishers can trigger to counter these blocks. This makes brand reputation and user trust a literal prerequisite for ranking.

If your domain has high “dismissal” rates or has been frequently blocked, your content may be technically eligible and high-quality, but it will never reach the ranking stage because the filter triggers first.

Technical Requirements: Images and Meta Tags

The visual nature of Google Discover means that technical SEO for this platform is heavily focused on assets rather than just text. The research highlights six key page-level tags that Discover reads, with the Open Graph (OG) tags being the most vital. If a page lacks a valid og:image, it is effectively disqualified from appearing as a card.

The 1200px Threshold

To qualify for the large, high-performing cards that drive the vast majority of Discover traffic, images must be at least 1200 pixels wide. The system is designed to favor high-resolution visuals. While smaller images may still allow a page to appear as a small thumbnail, these smaller cards consistently earn lower click-through rates and are often deprioritized by the pCTR model.

Fallback Mechanisms

Interestingly, the Discover architecture is built with several redundancies. If the og:title tag is missing, the system will attempt to pull information from the Twitter title tag or the standard HTML title tag. However, relying on fallbacks is risky, as the system may choose a less-optimized title that fails to entice clicks.

The “Kill Switch” Meta Tags

The research identified two specific meta tags that act as an accidental “kill switch” for Discover visibility: “nopagereadaloud” and “notranslate.” If these tags are present, they can prevent the page from entering the Discover pipeline entirely. While these tags have legitimate uses in web development, publishers should use them with extreme caution if they rely on Discover for traffic.

The Freshness Decay Model

Google Discover is fundamentally a “freshness” engine. While evergreen content can and does surface, the system is biased toward newness. The research reveals a specific decay schedule that governs how long a piece of content remains viable in the feed.

1. The Golden Window (1 to 7 days): This is when content receives its strongest boost. If an article is going to “go viral” on Discover, it usually happens within this first week.

2. The Moderate Phase (8 to 14 days): Visibility begins to taper off. Only content with exceptionally high engagement signals stays prominent during this window.

3. The Limited Visibility Phase (15 to 30 days): Most content falls off the radar here. The system begins to prioritize newer updates on similar topics.

4. Gradual Decline (30+ days): Content is moved into a secondary evergreen pool. At this stage, it will only surface if it perfectly matches a very specific niche interest or if there is a sudden resurgence in the topic’s popularity.

For publishers, this underscores the importance of a consistent publishing cadence. Because the decay is built into the system’s architecture, relying on a few high-performing older articles is not a sustainable strategy for Discover traffic.

The pCTR Model and Ranking Signals

At the heart of the ranking process is the predicted Click-Through Rate (pCTR) model. This is a server-side calculation that estimates how likely a user is to engage with a piece of content. Because Google wants to keep users in the app, the system prioritizes content that is likely to be clicked.

Several observable signals are sent to the pCTR model before a ranking decision is made:

Content Authority and Engagement History

The system evaluates the past performance of the specific URL. If the URL had a high click-through rate and long dwell time in its first few hours of life, the model will likely boost its reach. It also considers the domain’s historical performance. If a publisher consistently produces content that users engage with, their new articles start with a slight advantage in the pCTR estimation.

Visual and Structural Quality

Beyond just having a 1200px image, the system checks if the image loads successfully and quickly. Broken images or slow-loading assets can lead to a card being pulled from the feed in real-time. The clarity of the title also plays a role; titles that are too vague or overtly “clickbaity” in a way that leads to high bounce rates may eventually be suppressed by the feedback loop.

Personalization and User Behavior

Discover is not a monolithic feed; it is a reflection of the individual. The research clarifies how Google uses personalization layers to curate the experience. This personalization is driven by more than just search history.

Individual Actions: Actions like following a topic, saving an article, or explicitly dismissing a story are weighted heavily. If a user dismisses a story, that action is stored permanently for that specific URL. It will not resurface for that user, even if it is updated or becomes more relevant later.

Time Spent Reading: Google monitors the engagement after the click. If a user clicks a Discover card and immediately returns to the feed, it signals that the content did not meet their expectations. Conversely, long dwell times signal high-quality content, which helps the article stay in the “Golden Window” of freshness for a longer period.

Publisher Center Registration: While not a guaranteed ranking factor, being registered in the Google Publisher Center provides more signals to the system regarding the legitimacy and niche of the publisher, potentially aiding in better interest matching.

The Role of Server-Side Experiments

One of the most enlightening aspects of the research is the sheer scale of experimentation occurring within the Discover app. During a single observed session, roughly 150 server-side experiments were running simultaneously. Furthermore, over 50 feature controls were active, affecting everything from card layout to font sizes and recommendation algorithms.

This explains the “Discover Rollercoaster” that many publishers experience. You might see a massive spike in traffic followed by a total blackout, even if you haven’t changed your content strategy. In many cases, these fluctuations are not a result of your actions but are due to Google placing a segment of users into an experiment group where your content type or niche is being weighted differently.

Volatility is a feature, not a bug, of the Discover system. Because the feed is constantly reordering itself in real-time—often adding or removing content while the user is actively scrolling—there is no such thing as a “stable” ranking in Discover.

Actionable Takeaways for Content Strategy

Based on these architectural insights, publishers can refine their approach to Google Discover to maximize their eligibility and ranking potential.

Prioritize Visual Assets

Never publish an article intended for Discover without a high-quality, 1200px wide image. Ensure your CMS is correctly outputting the og:image tag. Experiment with different visual styles—research shows that original photography often outperforms generic stock images in pCTR models.

Optimize for the “Golden Window”

Since the first seven days are the most critical, ensure your technical SEO is flawless at launch. Submit your URLs to Search Console immediately to speed up crawling. If a story is breaking, every hour counts before the freshness decay begins to set in.

Avoid the “Block” at All Costs

Because publisher blocks happen before ranking, you must avoid alienating your audience. Misleading titles might drive short-term clicks, but if they lead to users blocking your domain, you are effectively ending your chances of future Discover traffic. Focus on building a brand that users want to see in their feeds.

Monitor Core Web Vitals and Page Experience

Since the system records user feedback like dwell time, a poor mobile experience can kill an article’s momentum. If your page is slow, cluttered with intrusive ads, or difficult to navigate, users will bounce, signaling to the pCTR model that your content is low quality.

The Future of Content in Discover

The research into Google Discover’s architecture reveals a system that is far more complex than a simple recommendation engine. It is a multi-layered pipeline that prioritizes user preference, visual quality, and extreme freshness. By understanding the mechanics of pCTR, the impact of meta tags, and the reality of constant server-side experimentation, publishers can move away from guesswork and toward a data-driven strategy.

While Discover will likely always remain somewhat volatile due to its personalized nature, focusing on the technical requirements and engagement signals identified in this research provides the best possible path to sustained visibility and traffic.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top