How Google Discover qualifies, ranks, and filters content: Research

The Mechanics of Google Discover: A Deep Dive into Content Qualification and Ranking

Google Discover has become one of the most significant yet unpredictable drivers of organic traffic for modern publishers. Unlike traditional search, where users enter specific queries, Discover is a push-based system that anticipates user interests. For years, the inner workings of this feed have remained a “black box” for SEOs and digital marketers. However, recent SDK-level research by Metehan Yesilyurt has provided an unprecedented look at the structured, multi-stage pipeline Google uses to qualify, rank, and filter content before it ever reaches a user’s screen.

Understanding this architecture is vital for any digital publisher looking to stabilize their traffic. The research reveals that Discover is not just a simplified version of Google Search; it is a complex ecosystem governed by hard publisher blocks, strict technical requirements, rapid freshness decay, and a massive layer of ongoing experimentation. By dissecting the nine-stage flow of content through the Discover pipeline, we can better understand why some articles go viral while others never gain traction.

The Nine-Stage Content Pipeline

Google Discover operates through a sophisticated delivery framework that filters out noise and prioritizes high-quality, relevant content. According to the research, content passes through nine distinct stages. Failure at any single point can lead to a complete lack of visibility.

The process begins with crawling and understanding. Google’s bots must first find the content and parse its meaning. This is followed by reading meta tags, where the system identifies critical elements like the title and the lead image. The third stage involves classification, where the system determines if the content is breaking news, a feature story, or evergreen content.

Crucially, the fourth stage is a block check. Before any ranking or interest matching occurs, Google determines if the publisher or the specific URL has been blocked by the user or flagged for policy violations. If the content passes this gate, it moves to interest matching, where it is paired with users based on their browsing history and preferences.

The final stages involve a server-side click-through rate (pCTR) prediction model, feed layout construction, delivery, and finally, the collection of user feedback. This feedback loop ensures that the system learns from every swipe, click, or dismissal, refining future recommendations in real-time.

The Pre-Ranking Filter: Why Some Publishers Are Invisible

One of the most striking findings of the research is that publisher-level blocks happen remarkably early in the process. In traditional Search, a user might see a site they dislike and simply choose not to click. In Discover, a single user action can have a devastating impact on a domain’s visibility for that individual.

When a user selects “Don’t show content from [Site Name],” that publisher is effectively removed from the user’s Discover ecosystem. This block is processed before the ranking model even looks at the content. This means that even if you produce the most relevant, high-quality article for a specific user interest, you will never reach them if you have been previously blocked. Unlike many social media algorithms, there is no equivalent “sitewide boost” mechanism to counter these blocks. The system is designed to be highly sensitive to negative user signals to maintain the quality of the personal feed.

The Power of User Dismissals

User feedback in Discover is often permanent. If a user dismisses a specific story, that action is recorded on Google’s servers and linked to that URL. The research indicates that once a story is dismissed, it will not resurface for that user, regardless of how much engagement it receives from others. This puts immense pressure on publishers to ensure that their headlines and images are not just enticing, but also highly accurate to avoid the “click-and-dismiss” behavior that can signal low quality to the algorithm.

The pCTR Model: The Brain of Discover Ranking

Ranking in Google Discover is heavily influenced by a Predicted Click-Through Rate (pCTR) model. This is an AI-driven evaluation that happens on Google’s servers, estimating the likelihood of a specific user engaging with a specific piece of content. While the model itself is not visible to the public, the signals sent to the model have been mapped.

The pCTR model considers several key factors:

The og:title: The primary headline used for the card.
Image Quality and Dimensions: High-resolution visuals are prioritized.
Freshness: How recently the content was published or updated.
Historical Performance: Previous click and impression data for the URL and the domain.
Technical Stability: Whether the images load correctly and the page is mobile-friendly.

If the pCTR model predicts a low engagement rate, the content will be pushed down in the feed or omitted entirely in favor of content with a higher predicted success rate. This creates a “rich get richer” scenario where high-performing content continues to gain momentum while underperforming content is quickly phased out.

The Crucial Role of Image Quality and Meta Tags

Technical SEO in Discover is heavily reliant on Open Graph (OG) tags. The research highlights six key page-level tags that Google Discover reads to build the visual cards. The most critical of these is the og:image. Without a valid image, a page will typically not appear in the Discover feed at all.

The 1200px Requirement

To qualify for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1200 pixels wide. While smaller images may still be eligible for Discover, they are often relegated to small thumbnails. These thumbnails have significantly lower click-through rates compared to large-form cards. For publishers, ensuring that every article has a high-quality, 1200px-wide featured image is perhaps the single most effective technical optimization for Discover.

Backup Tags and Metadata

Google’s system is designed to be resilient. If the og:title tag is missing, the system will look for backups, such as the Twitter title tag or the standard HTML title tag. However, relying on backups is risky, as it may not present the content in the most optimized way. Furthermore, specific meta tags can act as “kill switches.” The research found that the “nopagereadaloud” and “notranslate” tags can prevent a page from entering the Discover pipeline entirely. Publishers should audit their CMS settings to ensure these tags are not accidentally enabled.

The Freshness Decay: Understanding Content Lifespans

One of the most frequent questions publishers ask is why their Discover traffic suddenly vanishes after a few days. The research provides a clear breakdown of the “freshness boost” and how it decays over time. Google Discover categorizes content into specific time windows that dictate its visibility potential.

1 to 7 Days: This is the “golden window.” New content receives the strongest visibility boost and has the highest chance of appearing at the top of a user’s feed.
8 to 14 Days: Visibility begins to taper. Content in this window can still perform well if it has high engagement, but it no longer receives the primary freshness boost.
15 to 30 Days: The system identifies this as older content. Visibility becomes limited, typically surfacing only for very niche interest matches.
30+ Days: Most content enters a stage of gradual decline. While Google does have a separate classification for “evergreen” content that can resurface months later, the vast majority of Discover content has a lifespan of less than a month.

This decay explains the “spike and cliff” pattern seen in many Google Search Console reports. For news-heavy sites, the window is even tighter, often lasting only 24 to 48 hours.

Personalization and User Behavior Signals

The “personalization layer” of Discover is what makes it unique. Google pulls from a massive repository of user data to tailor the feed. This includes broader interest data from search history, YouTube viewing habits, and even location data. However, publisher-specific signals also play a major role.

Registration in the Google Publisher Center is a key signal of trust. While it doesn’t guarantee ranking, it helps Google identify the publisher and its primary topics. Additionally, individual user actions such as “following” a topic or a specific publisher, “saving” a story, or spending significant time reading an article (dwell time) act as positive reinforcement for the algorithm. If a user consistently spends time on a particular site’s articles, the pCTR model will adjust upward for that publisher in that specific user’s feed.

The Volatility of Experiments

Many SEOs find Discover frustrating because a strategy that works one week may fail the next. The research highlights that this volatility is often by design. During the observation period, there were approximately 150 server-side experiments running simultaneously. Furthermore, over 50 different feature controls were active, affecting how cards were displayed and which signals were prioritized.

This means that no two users see exactly the same feed, even if they have identical interests. One user might be in a group testing a new “video-first” layout, while another is in a group testing a minimalist text-only card. This environment of constant testing makes it difficult to pinpoint a single “correct” way to optimize, as the “rules” of the feed are effectively in a state of constant flux.

Real-Time Feed Updates

Unlike a static search results page, Google Discover is dynamic. The research discovered that the feed can add, remove, or reorder content cards while a user is actively browsing, even without a manual refresh. This real-time updating is driven by the feedback loop; if a user engages with a specific topic, the system can immediately inject more of that topic into the feed. Conversely, if a user ignores a certain type of content, the system can prune similar items from the active session. This underscores the importance of the initial engagement; the first few seconds of a user’s session dictate what they see for the rest of their browsing time.

Actionable Strategies for Success in Google Discover

Based on these findings, publishers can move beyond guesswork and implement a data-driven strategy for Discover. Success in this ecosystem requires a combination of technical precision and high-quality storytelling.

Prioritize Visual Assets

Ensure every article features a compelling, high-resolution image at least 1200px wide. Avoid generic stock photos when possible, as unique and striking visuals tend to have higher pCTR scores. The image is the first thing a user sees; if it doesn’t grab attention, the rest of the metadata doesn’t matter.

Optimize Meta Tags for Clarity and Engagement

Your og:title should be written for human curiosity, not just keyword density. While SEO titles for search often focus on specific queries, Discover titles should be evocative and promise value. However, avoid “clickbait” that leads to high dismissal rates, as this will result in a long-term penalty in the pCTR model.

Maintain a High Cadence of Fresh Content

Because of the aggressive freshness decay, publishers who want consistent Discover traffic must maintain a regular publishing schedule. Since content begins to lose its boost after seven days, a “burst” strategy—where a site publishes several articles and then goes silent—will lead to inconsistent traffic patterns. A steady flow of new content keeps the domain active in the “golden window.”

Focus on E-E-A-T

Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T) are critical in Discover. Because the system filters content before ranking, being recognized as a trusted publisher is essential. Registration in the Google Publisher Center and maintaining a clear topical authority help the system classify your content correctly and match it with the right audience.

Monitor Negative Signals

Keep a close eye on engagement metrics. While Google Search Console provides click and impression data, it doesn’t show “dismissals.” However, a sudden drop in impressions for a high-performing URL often indicates that the pCTR model has adjusted downward based on negative user feedback. Analyze these instances to see if your headlines or images were misleading.

Final Thoughts

The research into Google Discover’s architecture reveals a system that is far more structured and rigorous than previously thought. It is an ecosystem that prioritizes the user’s immediate experience and long-term preferences over the publisher’s desire for traffic. By understanding that publisher blocks happen before ranking, that freshness is a ticking clock, and that technical meta-tags can be the difference between a viral hit and total invisibility, publishers can better navigate the complexities of modern digital distribution.

While the heavy experimentation and real-time updates make Discover a volatile source of traffic, they also represent an opportunity. Publishers who consistently provide high-quality, visually appealing, and timely content will find themselves favored by a system that is constantly looking for the next best thing to show its users. In the world of Google Discover, there are no shortcuts—only the continuous pursuit of engagement and technical excellence.