How Google Discover qualifies, ranks, and filters content: Research

Google Discover has long been one of the most mysterious and volatile sources of traffic for digital publishers. Unlike traditional Google Search, which relies on a user entering a specific query, Discover is a proactive, “query-less” feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many news organizations and tech blogs, a single hit in Discover can result in hundreds of thousands of visits in a matter of hours, yet the mechanisms behind how content is selected have largely remained a black box.

Recent SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on this system. By analyzing the observable signals within Google’s Discover app framework and telemetry data, Yesilyurt mapped out the intricate pipeline that content must navigate before it ever reaches a user’s screen. This research reveals a structured, nine-stage flow governed by strict technical requirements, predictive modeling, and aggressive filtering. Understanding this pipeline is no longer optional for SEOs; it is the blueprint for survival in a push-based content economy.

The Nine-Stage Google Discover Pipeline

The journey from a published article to a Discover card is not a simple linear path. Instead, it is a high-speed filtering process designed to eliminate low-quality or irrelevant content as early as possible. According to the research, the process can be broken down into nine distinct phases.

First, Google must crawl and understand the content. This is the foundation of all Google products, but in Discover, the emphasis is heavily placed on semantic understanding and classification. Once crawled, the system moves to metadata extraction, where it specifically looks for key tags like the image and title. Following this, the content is classified into categories, such as “breaking news” or “evergreen,” which dictates how the system handles its “freshness” decay later on.

The fourth stage is perhaps the most critical for publishers: the block check. Before any ranking or interest matching occurs, the system checks if the user or the platform has blocked the publisher. If a user has previously selected “Don’t show content from this site,” the content is discarded immediately. If the content survives the block check, it moves to interest matching, where Google’s vectors map the article’s topics to the user’s documented interests.

The final stages involve server-side predictive modeling (pCTR), feed layout construction, content delivery, and the recording of user feedback. This feedback loop is continuous; if a user engages with the content, it reinforces the publisher’s standing. If they dismiss it, the system learns to suppress similar content in the future.

The Power of the Publisher Block

One of the most striking findings of the research is the hierarchy of filtering. Many SEOs believe that ranking factors like page speed or keyword density are the primary drivers of visibility. However, the research shows that publisher-level blocks happen long before the ranking engine even looks at your content.

The “Don’t show content from this site” action is an incredibly powerful tool in the user’s hands. When a user blocks a domain, that content is suppressed across the board for that individual. There is currently no equivalent “sitewide boost” mechanism that rewards a publisher as aggressively as a block punishes them. This creates a high-stakes environment where a single piece of misleading or “clickbaity” content can lead to a permanent loss of a potential reader’s entire lifetime value on the platform.

This “hard block” logic underscores the importance of brand trust. In Discover, you aren’t just competing for a click; you are competing to remain in the user’s ecosystem. If your content consistently fails to deliver on the promise of its headline, users will eventually exercise their right to block your domain, effectively “de-indexing” you from their personal feed.

The Ranking Model: pCTR and Server-Side Logic

Once an article passes the initial eligibility and block filters, it enters the ranking phase. The research highlights the use of a predicted click-through rate (pCTR) model. This model resides on Google’s servers and estimates the likelihood of a user clicking on a specific card based on several variables.

While the internal weights of the pCTR model are not visible to the public, the SDK telemetry shows which signals the app sends to Google’s servers to inform these decisions. These include:

The Page Title: Extracted primarily from the Open Graph title tag (og:title).
Image Quality and Dimensions: The system checks if the image is large enough and if it has loaded successfully in the past.
Content Recency: A “freshness” score is applied based on the publication timestamp.
Historical Engagement: Previous click and impression data for that specific URL.
Technical Reliability: Signals indicating whether images or snippets are failing to render properly.

The pCTR model is dynamic. If an article begins to perform well (i.e., its actual CTR exceeds its predicted CTR), it can “trend,” causing the system to push it to a much wider audience of users with similar interest profiles. Conversely, if a story has a high impression count but very few clicks, the pCTR model will quickly deprioritize it, leading to the “traffic cliff” many publishers experience after a successful run.

The Critical Role of Image Requirements

In Google Discover, visuals are not just an aesthetic choice; they are a technical requirement. The research confirms that Google Discover reads specific page-level tags, with a heavy reliance on Open Graph metadata. If a page lacks a high-quality image, it is often disqualified from appearing as a prominent card, or it may not appear at all.

To qualify for the high-engagement “large card” format, images must be at least 1,200 pixels wide. Content with smaller images is typically relegated to small thumbnail layouts, which have significantly lower click-through rates. Furthermore, the research indicates that Google monitors whether images load successfully. If your site has technical issues like slow-loading images or broken 404 links for metadata images, the system may filter your content out entirely to preserve the user experience of the feed.

Publishers should also be aware of fallback mechanisms. If the og:title is missing, Google will attempt to use the twitter:title or the standard HTML <title> tag. However, relying on fallbacks is risky. For the best results, publishers should ensure that their Open Graph tags are explicitly defined and optimized for a “discovery” mindset rather than a “search” mindset.

Silent Killers: Meta Tags That Block Content

Perhaps the most surprising technical detail in Yesilyurt’s research is the existence of specific meta tags that act as “off switches” for Google Discover. While most SEOs are familiar with noindex, two other tags can quietly remove your content from the Discover pipeline:

nopagereadaloud: This tag prevents Google’s Assistant or other services from reading the page content aloud. While intended for accessibility control, the research suggests it can interfere with the content’s eligibility for certain Discover features.
notranslate: If this tag is present, it signals to Google that the content should not be translated. This can limit the content’s reach in multi-lingual regions or prevent it from entering the Discover pipeline in specific localized feeds.

Checking for these tags is a vital part of any technical SEO audit for Discover. If your CMS or a third-party plugin is inadvertently inserting these tags, you could be blocking yourself from millions of impressions without ever knowing why.

The Freshness Decay: The Life Cycle of a Discover Story

Freshness is a core component of the Discover algorithm, but it isn’t applied equally across all content. The research identified specific time windows that dictate how much of a “boost” a story receives based on its age.

The “Golden Window” is 1 to 7 days. During this period, content receives its strongest visibility boost, as the system prioritizes “what’s happening now.” Between 8 and 14 days, visibility begins to moderate, though it remains relatively high for trending topics. From 15 to 30 days, content enters a “limited visibility” phase, where it is only shown to users with a very high interest match. After 30 days, most content faces a gradual but steady decline in impressions.

However, there is an exception for content classified as “Evergreen.” The research notes that Google’s classifiers can identify content that remains relevant over long periods (such as tutorials or deep-dive explainers). Evergreen content bypasses the aggressive freshness decay, allowing it to resurface months or even years later if a user shows a renewed interest in the topic. This highlights the need for a dual strategy: publishing high-velocity news to capture immediate traffic, while maintaining a library of evergreen assets for long-term Discover stability.

Personalization and the User Feedback Loop

Discover is a deeply personal experience. The research identifies four primary layers of personalization that Google uses to curate a feed for an individual user.

The first layer is Google’s broader interest data. This is gathered from the user’s search history, YouTube viewing habits, and location data. The second layer involves publisher signals. If a user frequently visits a specific site via search, Google is more likely to show that site in their Discover feed. Interestingly, the research mentions that Publisher Center registration can play a role in how Google identifies and trusts a publisher’s entity.

The third and fourth layers involve individual actions. “Follows” and “Saves” are strong positive signals that tell Google to prioritize a specific topic or publisher. On the flip side, “Dismissals” are permanent. If a user swipes away a story or clicks “Not interested,” that specific URL is stored as a negative signal and will never be shown to that user again. This reinforces the idea that every impression is a test; if your content is irrelevant to the audience it’s being served to, the feedback loop will eventually choke off your reach.

A World of Experiments: Why Discover is Volatile

One of the most frustrating aspects of Google Discover for publishers is its inherent volatility. You can have a record-breaking day on Monday and nearly zero traffic on Tuesday with no changes to your content strategy. The research provides a clear explanation for this: constant experimentation.

During the observed sessions, Yesilyurt found that approximately 150 server-side experiments were running simultaneously. These experiments can affect anything from the ranking weights of certain categories to the visual layout of the cards. Additionally, over 50 “feature controls” were identified that dictate which UI elements are shown to specific users.

This means that no two users see the same feed, and your content might be performing differently simply because it is being tested in different experiment groups. This level of constant A/B testing by Google makes it nearly impossible to “game” the system. Instead, publishers must focus on long-term trends and technical health rather than obsessing over day-to-day fluctuations.

Real-Time Feed Updates

Unlike a traditional search results page, which is relatively static once loaded, Google Discover is living and breathing. The research reveals that the system can add, remove, or reorder cards in real-time while a user is browsing, without requiring a manual refresh. If a breaking news story emerges that matches a user’s high-interest profile, the system can inject it into the feed instantly.

This real-time capability explains why “breaking news” publishers often see such sudden spikes in Discover traffic. The system is designed to be as current as possible, and the infrastructure supports near-instantaneous content delivery for high-authority sources.

Actionable Insights for Publishers and SEOs

Based on this comprehensive research, what should publishers do to maximize their chances of success in Google Discover? The strategy should be built on three pillars: technical eligibility, visual excellence, and user trust.

First, audit your technical setup. Ensure that your Open Graph tags are perfect. Your og:image should be high-resolution (at least 1200px wide), and your og:title should be engaging without being deceptive. Check your site for any accidental meta tags like nopagereadaloud that could be filtering you out of the pipeline.

Second, prioritize the user experience. Because dismissals are permanent and publisher-level blocks happen before ranking, you cannot afford to annoy your audience. Avoid intrusive ads that hinder content readability, and ensure your headlines accurately reflect the content of the article. High engagement signals, such as time spent on page, are recorded and fed back into the system to inform future ranking.

Third, understand the importance of freshness. If you are a news site, speed is your greatest asset. Getting a story indexed and classified within the first few hours of a trending event is the key to entering the “Golden Window” of visibility. If you are a lifestyle or tech blog, focus on creating evergreen content that can bypass the standard decay and provide a steady stream of “long-tail” Discover traffic.

Google Discover remains a complex and ever-evolving platform, but this research proves that it is not entirely random. It is a highly engineered pipeline that values quality, relevance, and technical precision. By aligning your content strategy with the nine-stage flow and avoiding the “silent killers” of the algorithm, you can turn the “black box” of Discover into a predictable engine for growth.