How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline

For many digital publishers, Google Discover represents a volatile yet indispensable source of organic traffic. Unlike traditional Search, which relies on active queries, Discover is an interest-based feed that pushes content to users before they even know they want it. However, the mechanics behind why one article goes viral while another fails to gain a single impression have long remained a “black box” for SEOs.

Recent SDK-level research by Metehan Yesilyurt has pulled back the curtain on this mysterious system. By analyzing the observable signals within Google’s Discover app framework, the research reveals a highly structured, multi-stage pipeline. This system isn’t just about what a user likes; it involves hard publisher blocks, strict technical requirements, freshness decay, and a massive layer of server-side experimentation.

Understanding this architecture is critical for any site looking to stabilize its Discover traffic. If your content is failing to surface, the issue might not be your topic—it might be a failure at one of the early qualification stages that happens before ranking even begins.

The Nine-Stage Flow of Content Discovery

The journey from a published article to a user’s mobile feed involves nine distinct stages. Each stage acts as a filter or a processor, ensuring that only the most relevant and high-quality content reaches the end user.

1. Crawl and Analysis: Google must first discover and crawl the page. During this phase, the system builds an initial understanding of the content’s topic and entity relationships.
2. Metadata Extraction: The system reads key meta tags, specifically focusing on the Open Graph (OG) tags for titles and images.
3. Content Classification: Content is categorized by type. This determines if the piece is “Breaking News,” “Evergreen,” or “Niche Interest.”
4. Block Filtering: The system checks if the publisher or the specific URL has been blocked by the user or by Google’s internal safety filters.
5. Interest Matching: Google matches the content’s entities and topics against the user’s historical browsing data, search history, and app activity.
6. pCTR Prediction: A server-side model estimates the Predicted Click-Through Rate. This is a crucial gatekeeper for visibility.
7. Feed Construction: The layout is built, determining whether the content appears as a large hero card or a smaller thumbnail.
8. Delivery: The content is pushed to the user’s device.
9. Feedback Loop: The system records how the user interacts with the card—whether they click, dismiss, or ignore it.

The Barrier to Entry: Publisher Blocks and Filters

One of the most significant findings of the research is the placement of the publisher block in the pipeline. In Google Discover, a user has the option to select “Don’t show content from [Site Name].” According to the SDK analysis, this block happens at a foundational level, well before interest matching or ranking occurs.

If a user has blocked your domain, your content is effectively dead to them. There is no amount of “SEO optimization” or high-quality reporting that can bypass this. Furthermore, while a user can suppress an entire domain with one click, there is no equivalent “boost” mechanism. Following a site helps, but the negative signal of a block is far more powerful and permanent in the system’s logic.

This highlights the importance of maintaining brand trust. If a publisher consistently uses “clickbait” headlines that frustrate users, they risk being blocked at the domain level, which leads to a permanent decline in Discover reach for that specific user and potentially influences the broader algorithmic perception of the site.

The Technical Architecture: Meta Tags and Image Standards

Google Discover is a visual-first medium. The research confirms that the app framework looks for six specific page-level tags to generate a card. The most critical of these are og:image and og:title. If these tags are missing, the system searches for fallbacks like Twitter title tags or the standard HTML title. If no suitable image is found, the content is often disqualified from appearing entirely.

Image quality is a primary ranking factor for visibility. To qualify for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1,200 pixels wide. Small images or thumbnails are not only less visually appealing but are also deprioritized by the layout engine.

Additionally, certain technical meta tags can act as a “kill switch” for Discover eligibility. The tags “nopagereadaloud” and “notranslate” have been observed to stop pages from entering the Discover pipeline. While these tags are often used for specific accessibility or regional reasons, publishers should be aware that their inclusion may inadvertently choke off Discover traffic.

Ranking and the pCTR Model

Ranking in Discover is not a static score but a predictive calculation. Google utilizes a Predicted Click-Through Rate (pCTR) model on its servers. This model estimates the likelihood of a specific user clicking on a specific story based on several variables:

User Engagement History: Has this user clicked on similar topics or this specific publisher before?
Article Title: The og:title is heavily scrutinized for relevance and “clickability.”
Image Success: Does the image load quickly and meet quality standards?
URL Performance: The past click and impression data for that specific URL. If a story starts strong with a high CTR, the model is more likely to broaden its distribution.

The pCTR model is hidden from the user and the publisher, but the telemetry data shows that these signals are transmitted to Google’s servers before any ranking decision is finalized. This confirms that early engagement is vital; if the first group of users who see the content doesn’t interact with it, the pCTR score drops, and the content is phased out of the feed.

The Decay of Freshness: Timing Your Content

Freshness is perhaps the most aggressive filter in the Discover ecosystem. While Search can surface content from years ago, Discover favors the “now.” The research identifies specific time windows that dictate how content is treated:

1 to 7 Days: This is the “golden window.” Content in this age range receives the strongest visibility boost. This is where news and trending topics live.
8 to 14 Days: Visibility begins a moderate decline. The content is still eligible but requires very high engagement to maintain its position.
15 to 30 Days: Visibility becomes limited. Only exceptionally high-performing or highly niche-relevant content survives in this window.
30+ Days: A gradual but certain decline into obscurity.

While there is a separate classification for “Evergreen” content—which allows older articles to resurface when a user shows a new interest in a topic—the default state of the algorithm is heavily biased toward newness. For publishers, this means that the “half-life” of a Discover hit is very short, necessitating a consistent output of new content to maintain traffic levels.

Personalization and the NAIADES System

The research references “NAIADES,” a term associated with the telemetry of user interests and personalization. Google Discover doesn’t just look at what you’ve searched for; it looks at a holistic view of your digital behavior.

Personalization is layered:
Broad Interest Data: General topics like “Gaming,” “Tech,” or “Politics.”
Publisher Signals: Whether you have registered with the Google Publisher Center or if the user “Follows” your site.
Individual Actions: Explicit signals like saves, shares, and dismissals.
Engagement Signals: How long a user stays on a page after clicking.

A critical finding in the feedback loop is that user dismissals are permanent. If a user swipes a story away or explicitly dismisses it, that URL will never resurface for that user. This is a “one-shot” opportunity for publishers to capture attention.

Constant Volatility: The Role of Server-Side Experiments

If you’ve ever wondered why two people with identical interests see completely different Discover feeds, the answer lies in experimentation. The research observed approximately 150 server-side experiments running simultaneously during a single session. On top of that, over 50 feature controls were active, affecting how cards were displayed and which UI elements were present.

This level of experimentation explains the extreme volatility many SEOs report. Google is constantly testing different pCTR models, layout configurations, and interest-matching algorithms. A sudden drop in traffic might not be a “penalty” or a technical error on your site; it could simply be that your site is part of a “control group” for an experiment that is testing a different content weighting.

Real-Time Feed Dynamics

Google Discover is not a static list of links. It is a live, dynamic environment. The system has the capability to add, remove, or reorder content cards while a user is actively browsing, without requiring a manual refresh of the app.

This real-time adjustment is driven by the feedback loop. If a story is suddenly trending across the network, it can be injected into a user’s feed mid-session. Conversely, if a story is receiving high dismissal rates across the board, Google can pull it from active feeds to preserve the quality of the user experience.

Practical Takeaways for Digital Publishers

The insights from this research provide a roadmap for optimizing for Google Discover. Success is less about “hacking” the algorithm and more about clearing the technical and qualitative hurdles that Google has put in place.

Prioritize High-Quality Visuals: Use images that are at least 1,200 pixels wide. Ensure they are compelling and relevant, as they are the primary driver of CTR.
Optimize Meta Tags: Don’t leave your OG tags to chance. Ensure your og:title and og:image are optimized specifically for a mobile feed. Avoid using the “nopagereadaloud” or “notranslate” tags unless absolutely necessary.
Focus on the First 72 Hours: Since freshness decay is so aggressive, the first three days are the most important. Ensure your content is promoted across other channels (like social media or newsletters) to kickstart the engagement signals that the pCTR model looks for.
Build Brand Affinity: Because publisher blocks are so destructive and permanent, avoid aggressive or misleading headlines that lead to user frustration. Long-term Discover success requires users to recognize and trust your brand when it appears in their feed.
Monitor the Publisher Center: While not a magic bullet, being properly set up in the Google Publisher Center helps Google classify your content and recognize your brand entities more effectively.

Google Discover remains one of the most powerful tools for audience acquisition in the modern web. By understanding the pipeline—from the initial crawl to the final pCTR prediction—publishers can better position their content to survive the filters and thrive in the feed.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top