How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Mechanism

For digital publishers and SEO professionals, Google Discover has long been something of a “black box.” Unlike traditional Google Search, which relies on explicit user queries, Discover is a highly personalized, query-less feed that anticipates what a user wants to see based on their past behavior, interests, and engagement patterns. Because of this proactive nature, Discover can drive massive surges in traffic—often referred to as “spikes”—that can dwarf traditional search traffic overnight.

However, this traffic is notoriously volatile. One day a site may receive hundreds of thousands of clicks, and the next, it may see nothing. New SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on the technical architecture of Google Discover. By analyzing the signals within the Google app framework, this research reveals a structured, nine-stage pipeline that dictates how content is qualified, ranked, and ultimately filtered before it ever reaches a user’s screen.

The Nine-Stage Content Pipeline

The journey of an article from a publisher’s CMS to a user’s Google Discover feed is not instantaneous. It follows a rigorous technical workflow designed to ensure quality, relevance, and safety. Understanding these stages is essential for diagnosing why certain content fails to surface.

1. Crawling and Extraction

The process begins with Google’s standard crawling infrastructure. Before a piece of content can be considered for Discover, Google must first discover the URL and parse its content. This is why standard SEO practices, such as maintaining a healthy XML sitemap and ensuring fast crawlability, remain the foundation of Discover success.

2. Meta Tag Analysis

Once crawled, the system focuses on specific meta tags. Unlike Search, which relies heavily on the HTML title tag and meta description, Discover prioritizes Open Graph (OG) tags. The research indicates that Discover specifically looks for “og:title” and “og:image.” If these are missing, the system searches for fallbacks like Twitter cards or the standard HTML title, but the absence of high-quality meta data can lead to a failure in the next stages.

3. Content Classification

Google classifies the content into distinct buckets. Is it breaking news? Is it an evergreen “how-to” guide? Is it a product review? This classification determines which “freshness” rules apply and which user interest groups the content will be tested against.

4. The Eligibility Gate (Publisher Blocks)

This is one of the most critical findings of the research. Before the system even attempts to match content to a user’s interests, it checks for publisher-level blocks. If a user has previously selected “Don’t show content from this site,” that publisher is effectively dead to that user. This filter happens server-side and is a hard block that prevents the content from even entering the ranking pool.

5. Interest Matching

Google compares the classified topics of the article against the user’s “Interest Graph.” This graph is built from search history, YouTube views, and previous interactions within the Discover feed itself. If there is no topical alignment, the content proceeds no further.

6. The pCTR (Predicted Click-Through Rate) Model

The ranking isn’t just about what you like; it’s about what Google thinks you are most likely to click. Using a sophisticated server-side model, Google assigns a predicted CTR (pCTR) to each eligible article. This model weighs the headline’s “clickability,” the visual appeal of the image, and the historical performance of similar topics.

7. Layout Construction

Once the top-ranking articles are selected, the system decides how to display them. This includes choosing between large hero cards or smaller thumbnail cards. The research notes that this stage is heavily influenced by image dimensions.

8. Delivery

The content is pushed to the user’s device. Interestingly, the feed is not static. It can update in real-time, adding or reordering cards while the user is actively scrolling, without requiring a manual refresh.

9. The Feedback Loop

Every action the user takes—clicking, Dismissing, reporting, or sharing—is recorded. This data is fed back into the pCTR model and the Interest Graph, refining the feed for the next session.

The Invisible Gatekeeper: Why Publisher Blocks Matter

One of the most sobering revelations from Yesilyurt’s research is the power of the publisher-level block. In the world of SEO, we are used to “penalties” or “ranking demotions” that result in lower positions. Google Discover is different.

When a user interacts with a card in their feed, they have the option to tell Google they are not interested in the topic or the publisher. If they choose to block the publisher, it creates a permanent, domain-wide suppression for that specific user.

Critically, there is no inverse “boost” mechanism. While a user can “Follow” a publisher, the research suggests that a “Follow” does not guarantee visibility in the same way a “Block” guarantees invisibility. For publishers, this means that clickbait or misleading headlines are a dangerous game. While they might drive a short-term spike in clicks, they increase the probability of users blocking the domain, which permanently shrinks the potential audience size in Discover.

Technical Requirements: Images and Meta Tags

If you want your content to occupy the most valuable real estate in the Discover feed—the large, high-engagement image cards—you must meet specific technical thresholds.

The 1200px Rule

The research confirms that image size is a primary filter for card layout. To qualify for a large image display, the “og:image” or the image specified in the Article schema must be at least 1,200 pixels wide. If the image is smaller than this, Google will often default to a small thumbnail or may choose not to show the article at all. High-resolution images are not just a cosmetic preference; they are a technical requirement for eligibility in the highest-performing segments of the feed.

The Role of Open Graph Tags

Google Discover leans heavily on the Open Graph protocol. The research identified six key tags that the system prioritizes:
1. og:title
2. og:image
3. og:description
4. og:url
5. og:site_name
6. og:type

While Google is capable of finding fallbacks, relying on the system to “guess” your title or image is a risk. Publishers should ensure their OG tags are optimized specifically for engagement, as these are the signals the pCTR model uses to determine ranking.

Tags That Stop Content Cold

Perhaps most surprisingly, the research highlighted two specific meta tags that can act as a “no-entry” sign for Discover:
1. **nopagereadaloud**: Using this tag can prevent the page from being processed for certain Discover features.
2. **notranslate**: This tag can prevent the content from appearing in Discover feeds for users whose primary language differs from the content’s language, effectively cutting off international traffic.

The Freshness Decay: The Life Cycle of a Discover Post

Discover is fundamentally a “current” feed. While evergreen content can and does surface, the system is biased toward newness. The research mapped out a specific decay timeline that governs how long a post remains viable in the feed:

1 to 7 Days: The Power Window

During the first week after publication, content receives its strongest visibility boost. Most “spikes” occur within the first 48 to 72 hours. During this window, the pCTR model is most aggressive in testing the content against various user segments.

8 to 14 Days: Moderate Visibility

After the first week, visibility begins to decline. Only content with exceptionally high engagement rates or topics that remain trending will maintain significant presence in the feed.

15 to 30 Days: The Tail Phase

By the third and fourth weeks, the content is largely phased out for most users, appearing only in highly specific interest-based scenarios.

30+ Days: Gradual Decline

After 30 days, most content disappears from Discover entirely. The exception is “Evergreen” content, which Google classifies differently. For a post to survive beyond 30 days, it must demonstrate consistent, long-term relevance to a specific user interest that isn’t tied to a news cycle.

The Science of Volatility: Experiments and A/B Testing

Publishers often complain about the “Discover Rollercoaster”—massive traffic one day, and zero the next. The research explains why this happens: Google is constantly experimenting.

During the observation period, researchers found approximately 150 server-side experiments running simultaneously. These experiments can change anything from the weight of pCTR to the way images are cropped or how “breaking news” is prioritized over “evergreen” content. Furthermore, there are over 50 feature controls that affect UI elements.

This means that two users with identical interests might see completely different feeds because they are in different experiment groups. For a publisher, this means that a sudden drop in traffic might not be a reflection of your content quality, but rather a shift in an internal Google experiment that changed how your specific niche is surfaced.

Optimizing for the pCTR Model

Since ranking is heavily dependent on a predicted click-through rate, publishers must think like data scientists. The pCTR model evaluates:
– **Visual Appeal**: Is the image clear, high-resolution, and relevant?
– **Headline Clarity**: Does the headline promise value without being deceptive?
– **Historical Context**: How have users interacted with this URL in the first few minutes of its life?

If the initial group of users who see your article in Discover dismisses it or scrolls past it without clicking, the pCTR score drops, and the system stops showing it to wider audiences. This “initial testing phase” is why the first few hours of a post’s life are so critical.

Practical Takeaways for Digital Publishers

The insights from this research allow for a more tactical approach to Google Discover optimization. Instead of guessing, publishers can focus on the technical and behavioral signals that matter.

1. **Prioritize Large Images**: Ensure every article has a high-quality “og:image” that is at least 1200px wide. Avoid generic stock photos; unique, compelling visuals drive higher CTR, which feeds the pCTR model.

2. **Audit Meta Tags**: Use a debugger to ensure your “og:title” and “og:image” tags are correctly implemented. Remove any “nopagereadaloud” or “notranslate” tags unless they are absolutely necessary for your business model.

3. **Monitor User Feedback**: While you cannot see individual user blocks, you can monitor your CTR in Google Search Console. A declining CTR in Discover is a leading indicator that your content may be alienating users, potentially leading to domain-level blocks.

4. **Focus on the First 48 Hours**: Since freshness is a major ranking factor, ensure your content is promoted across other channels (social, newsletters) immediately after publication. This initial engagement can provide the positive signals Google needs to “validate” the content for a wider Discover audience.

5. **Understand the “Naiades” System**: The research mentions “Naiades,” a system involved in real-time feed updates. This suggests that Discover is getting faster at reacting to trends. If a topic is trending, getting your high-quality take published quickly is more important than ever.

6. **Respect the User’s Choice**: Avoid “click-baity” tactics that lead to high bounce rates. If a user feels tricked into clicking, they are more likely to dismiss the card or block the publisher. In the Discover ecosystem, a “dismissal” is a permanent signal that hurts your long-term reach.

Conclusion

Google Discover is no longer a complete mystery. The research by Metehan Yesilyurt proves that it is a highly engineered, multi-stage system that prioritizes technical eligibility and user sentiment over traditional keyword matching. By focusing on high-resolution visuals, proper meta tag implementation, and maintaining a high standard of content that discourages user blocks, publishers can move away from “chasing the algorithm” and toward a sustainable Discover strategy.

While volatility remains an inherent part of the platform due to constant server-side experimentation, understanding the 1200px rule, the 30-day freshness decay, and the critical importance of the pCTR model gives publishers the tools they need to stay competitive in one of the most powerful traffic drivers on the modern web.