How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline

Google Discover has evolved into one of the most significant drivers of organic traffic for publishers, often rivaling or even surpassing traditional search results. However, for many digital strategists, it remains a “black box”—an unpredictable engine that grants massive traffic spikes one day and total silence the next. Recent SDK-level research by Metehan Yesilyurt has finally shed light on the internal mechanics of this system, revealing a structured, nine-stage pipeline that dictates how content is qualified, ranked, and occasionally filtered out entirely.

Unlike Google Search, which relies on user queries to pull relevant information, Discover is a proactive “push” system. It anticipates what a user might want to see based on their interests, browsing history, and engagement patterns. The research indicates that this process is far more mechanical and filtered than previously thought, involving strict technical prerequisites and real-time feedback loops.

The Nine Stages of Content Delivery

The journey from a published article to a user’s Discover feed involves a complex series of checkpoints. If a piece of content fails at any of these stages, it is discarded before it even has a chance to compete for a spot in the feed.

1. Crawling and Semantic Understanding

The process begins with Google’s standard crawling infrastructure. Before a story can appear in Discover, Google must first discover the URL and parse its content. During this phase, the system identifies the core topic, the entities mentioned (people, places, brands), and the overall sentiment of the piece.

2. Meta Tag Extraction

Google Discover relies heavily on specific metadata to build its visual cards. The system looks for Open Graph (OG) tags, specifically og:title and og:image. This stage is critical because it determines the visual “packaging” of your content. If these tags are missing, the system looks for fallbacks, such as Twitter card tags or the standard HTML title tag.

3. Content Classification

At this stage, the system assigns the content to specific categories. Is this a piece of breaking news? Is it evergreen lifestyle content? Is it a product review? This classification helps the algorithm match the content with the appropriate audience segments and determines which “freshness” rules apply to the article.

4. Filtering and Block Checks

One of the most significant findings of the recent research is that publisher-level blocks happen very early in the pipeline. If a user has previously selected “Don’t show content from this site,” that publisher is effectively dead to that user. This filter is applied before any interest matching or ranking occurs, meaning no amount of “high-quality content” can overcome a manual block.

5. Interest Matching

Google compares the classified content against the user’s “Interest Graph.” This graph is built from a user’s search history, YouTube watch patterns, and previous interactions within Discover. The goal is to find a conceptual overlap between what the publisher has written and what the user has historically enjoyed.

6. Predicted Click-Through Rate (pCTR) Modeling

Before the feed is rendered, Google runs a server-side model to predict the likelihood of a user clicking on a specific card. This pCTR model factors in the historical performance of the URL, the domain’s reputation, and how similar users have interacted with the content.

7. Feed Layout Construction

Google Discover doesn’t just list articles; it builds a visual experience. This stage determines whether an article gets a “large card” (with a full-width image) or a “small card” (with a thumbnail). The layout is influenced by the quality of the image provided and the predicted importance of the story.

8. Content Delivery

The content is finally pushed to the user’s device. This happens through the Google App on iOS and Android, as well as the mobile home screen on many Android devices.

9. Feedback Recording

The pipeline doesn’t end when the user sees the card. Every action—clicking, dismissing, sharing, or ignoring—is recorded and fed back into the system to refine future ranking and matching.

The Power of the Publisher Block

A critical takeaway from the research is the asymmetry between positive and negative signals. While there are many ways for a user to “suppress” a site, there are very few ways for them to “boost” it with equal force.

When a user selects “Don’t show stories from [Site Name],” it creates a hard filter at the server level. This is not a temporary demotion; it is a permanent exclusion for that user. Interestingly, there is no sitewide “Always show stories from this site” button that carries the same weight. While a user can “Follow” a topic, the “Block” function remains the most powerful tool in the user’s arsenal, making it vital for publishers to avoid “clickbaity” or polarizing content that might trigger a manual block.

Technical Gatekeepers: Large Images and Meta Tags

The research confirms that Discover has strict technical requirements that act as gatekeepers. If your site does not meet these standards, it may be filtered out of the most lucrative “large card” positions.

The 1200px Image Requirement

To qualify for the high-engagement large cards, images must be at least 1200 pixels wide. This is not just a suggestion; it is a technical threshold. If an image is smaller, Google may still show the content, but it will likely be relegated to a small thumbnail card, which statistically receives significantly lower click-through rates. Furthermore, if an image fails to load or returns a 404 error, the entire card is usually pulled from the pipeline.

The Danger of “No-Go” Tags

Certain meta tags can act as an accidental “off switch” for Google Discover. Specifically, the tags “nopagereadaloud” and “notranslate” were found to interfere with the Discover pipeline. While these tags are often used for accessibility or technical reasons on specific page types, their presence can signal to the Discover algorithm that the page is not suitable for the standard feed experience, leading to exclusion.

The Freshness Decay: A Race Against Time

Freshness is perhaps the most influential factor in Discover ranking. The research identified four distinct windows of visibility that determine the lifespan of a story in the feed.

1 to 7 Days: The Power Window

The vast majority of Discover traffic occurs within the first week of publication. During this time, the content receives its strongest “freshness boost.” For news-heavy sites, the peak usually occurs within the first 24 to 48 hours.

8 to 14 Days: The Moderate Phase

As content hits its second week, the freshness boost begins to wane. Traffic usually settles into a steady decline unless the content is picked up by a new cluster of users with specific interests.

15 to 30 Days: Limited Visibility

At this stage, only highly relevant or “trending” evergreen content maintains visibility. Most standard news items will have completely dropped out of the feed by this point.

30+ Days: Gradual Decline to Zero

While some evergreen content can survive for months, the research indicates a steep decline after 30 days. For content to reappear after this period, it usually requires a “re-triggering” event, such as a resurgence in search volume for that topic or a manual update to the article.

The Role of pCTR and Engagement Signals

Because Google Discover is an engagement-driven platform, its ranking model is heavily weighted toward Predicted Click-Through Rate (pCTR). This model attempts to answer one question: “If we show this to the user, will they click it?”

Several signals contribute to this prediction:
– **The og:title:** Is the headline compelling?
– **Image Quality:** Does the visual assets encourage engagement?
– **Historical Data:** How did this specific URL perform in its first few hours of life?
– **Domain Authority:** Does the domain have a history of high engagement in Discover?

It is important to note that a high pCTR alone isn’t enough; the engagement must be “high quality.” If users click but immediately bounce back to the feed, the system recognizes this as a poor match or “clickbait,” which can lead to the content being suppressed.

Experimental Volatility: Why Feeds Differ

If you’ve ever noticed that your Discover feed looks completely different from a colleague’s—even if you share similar interests—the research explains why. Google is constantly running hundreds of experiments simultaneously.

At any given moment, there are roughly 150 server-side experiments active within the Discover ecosystem. These can affect everything from the size of the cards to the weight given to certain interests. Additionally, there are over 50 feature controls that manage how cards are displayed. This heavy experimentation explains the inherent volatility of Discover traffic. A site might see a sudden drop in traffic not because of a change in content quality, but because Google is testing a new layout or ranking weight that unfavorably impacts that publisher’s niche.

Optimizing for Discovery: Actionable Takeaways

Based on the SDK telemetry and pipeline research, publishers can take several concrete steps to improve their chances of appearing and staying in Google Discover.

Prioritize Visual Assets

High-resolution, compelling imagery is non-negotiable. Ensure that your primary images are at least 1200px wide and that your `og:image` tags are correctly implemented. Avoid generic stock photos; unique, high-quality visuals are more likely to drive the clicks necessary to satisfy the pCTR model.

Master the Headline (Without Being Clickbait)

Your `og:title` should be descriptive and engaging. It needs to tell the user exactly what they are getting while creating enough curiosity to warrant a click. However, avoid misleading headlines; if the content doesn’t deliver on the title’s promise, negative engagement signals will quickly kill the story’s momentum.

Audit Your Meta Tags

Check your site’s header for any “nopagereadaloud” or “notranslate” tags that might be inadvertently blocking your content from the Discover pipeline. Ensure your Open Graph tags are clean and that your site’s structured data is up to date.

Focus on Topical Authority

Because Discover relies on interest matching, it rewards publishers that have a clear “niche” or topical authority. If your site consistently produces high-quality content about a specific subject—like gaming hardware or mobile tech—Google is more likely to trust your content when matching it to users who follow those interests.

Understand the Lifecycle of a Story

Don’t expect a single article to drive traffic forever. Since freshness is a major component, the best strategy is a consistent publishing cadence. For evergreen content, consider updating the article with new information and a new “last updated” date to potentially trigger a freshness refresh in the pipeline.

The Future of Personalization

The research underscores that Google Discover is moving toward a more personalized, real-time experience. The feed is no longer static; it can reorder itself while a user is scrolling based on what they just interacted with. As Google continues to integrate its “Interest Graph” more deeply into the Discover experience, the importance of understanding these architectural stages will only grow.

For publishers, the message is clear: success in Discover is a combination of technical excellence, visual appeal, and a deep understanding of user interest. While the algorithm may seem like a mystery, the research into its pipeline provides a roadmap for those looking to master this powerful traffic source.