How Google Discover qualifies, ranks, and filters content: Research

The Mechanics Behind Google Discover: A Deep Dive into Research

For many digital publishers and SEO professionals, Google Discover remains one of the most volatile and unpredictable sources of organic traffic. Unlike traditional Search, which relies on active queries, Discover is a proactive “queryless” feed that anticipates user interests. Because of its black-box nature, understanding why a story goes viral or why a site suddenly loses all visibility has largely been a matter of guesswork—until now.

Recent SDK-level research conducted by Metehan Yesilyurt has shed light on the internal architecture of Google Discover. By analyzing the observable signals within the Google app’s framework, the research reveals a complex, nine-stage pipeline that governs how content is crawled, qualified, ranked, and ultimately filtered. This research provides a roadmap for publishers looking to stabilize their Discover performance and understand the technical triggers that lead to success or exclusion.

The Nine-Stage Google Discover Pipeline

The journey from a published article to a user’s Discover feed is not a single leap. It is a structured process involving multiple checks and balances. According to the research, Google operates a sophisticated pipeline that evaluates content long before it ever reaches the ranking stage.

1. Crawling and Content Extraction

The process begins with Google’s standard crawling infrastructure. However, for Discover, the extraction focuses heavily on the “entity” of the article. Google isn’t just looking for keywords; it is trying to understand the core topic and how it relates to established user interests. This stage is where Google determines if the page is a standard article, a video, or another media type.

2. Meta Tag Analysis

Once crawled, the system specifically looks for Open Graph (OG) tags. These tags provide the “preview” information for the Discover card. The research highlights that Google prioritizes the og:title and og:image. If these are missing or poorly formatted, the content may fail to progress further in the pipeline.

3. Content Classification

Google classifies content into categories such as “Breaking News,” “Evergreen,” or “Niche Interest.” This classification dictates how the “freshness decay” model will be applied later. Breaking news is given an immediate, intense boost, while evergreen content is evaluated for its long-term relevance to specific user cohorts.

4. The Publisher Block Check

This is a critical “gatekeeper” stage. Before the system even considers if a user might like your content, it checks for publisher-level blocks. If a user has previously opted to “Don’t show content from this site,” the content is discarded immediately. This check happens server-side and is a binary filter that overrides all other ranking signals.

5. Interest Matching

In this stage, Google maps the content’s entities against the user’s Knowledge Graph profile. Google tracks a user’s search history, YouTube views, and previous Discover interactions to build a profile of interests. If the article’s topic doesn’t align with these interests, it is filtered out.

6. The pCTR (Predicted Click-Through Rate) Model

One of the most significant findings in Yesilyurt’s research is the existence of a server-side pCTR model. Google estimates the likelihood of a click before the content is even served. This model uses historical performance data for the URL, the domain’s reputation in Discover, and the visual appeal of the title and image to predict engagement.

7. Feed Layout Construction

Google Discover isn’t just a list; it’s a visual layout. In this stage, the system decides whether to show a large-image card, a small-thumbnail card, or a video carousel. The research notes that the layout is often determined by the quality and dimensions of the provided assets.

8. Content Delivery

The content is finally pushed to the user’s device. This happens dynamically, and the feed can be updated in real-time. The delivery stage is also where A/B testing and experimentation often take place, with different users seeing different variations of the same content.

9. User Feedback Loop

Once the content is delivered, the system enters a continuous loop of recording feedback. Did the user click? Did they dismiss it? Did they spend time reading the page? This data is fed back into the pCTR model for future content from that same publisher.

Hard Publisher Blocks: The Silent Traffic Killer

Perhaps the most sobering discovery in the research is the power of the publisher-level block. In the Google Discover interface, users have the option to hide individual stories or block an entire domain. According to the SDK analysis, these blocks are permanent and happen at the very beginning of the pipeline.

If a significant number of users block your site, your content is essentially “dead on arrival” for those segments. What makes this particularly challenging is that there is no equivalent “sitewide boost” mechanism. While a user can “Follow” a site, the negative signal of a block carries significantly more weight in the filtering process than a positive signal carries in the ranking process. This highlights the importance of maintaining high editorial standards; clickbait that leads to user frustration can result in long-term domain suppression through these hard blocks.

The Critical Role of Metadata and Image Quality

Technical SEO for Discover is often simplified to “having a good image,” but the research provides specific parameters that publishers must meet to remain competitive. Google Discover reads six key page-level tags, with the og:title and og:image being the most influential. If these are absent, Google will attempt to “fall back” to other tags, such as Twitter Cards or the standard HTML title tag, but these fallbacks are less reliable and may lead to poor card rendering.

Image dimensions are a major factor in qualification. To be eligible for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1200px wide. Smaller images are relegated to thumbnail status. In a feed that is almost entirely visual, a thumbnail card has a drastically lower pCTR, which in turn tells Google’s model that the content isn’t worth showing to more people.

Furthermore, two specific meta tags—nopagereadaloud and notranslate—can inadvertently cause a page to be excluded from Discover entirely. While these tags are intended for accessibility or localization control, the research suggests they act as signals that can stop the pipeline from processing the page for the Discover feed.

Understanding the Freshness Decay Curve

Discover is heavily weighted toward newness. The research mapped out a specific decay curve that helps explain why traffic often spikes and then disappears. The visibility windows are generally categorized as follows:

  • 1 to 7 Days: This is the peak window. Content receives the strongest freshness boost, and most Discover traffic occurs within the first 48 to 72 hours.
  • 8 to 14 Days: Visibility begins to taper. The content is only shown to users with very high interest in the specific topic.
  • 15 to 30 Days: Limited visibility. Only “pillar” content or stories with sustained high engagement remain in the feed.
  • 30+ Days: Gradual decline toward zero. Unless the content is specifically classified as “evergreen,” it will rarely appear in the feed after a month.

This decay curve underscores the need for a high-frequency publishing strategy for those relying on Discover. However, for evergreen content, Google uses a separate classifier that looks for long-term relevance, allowing some “how-to” guides or deep-dive features to resurface months later if a topic becomes trending again.

The pCTR Model and Experimental Volatility

A major takeaway from the research is that Google Discover is essentially a giant experimentation lab. During the study, it was observed that over 150 server-side experiments were running simultaneously. These experiments affect everything from card size and font weight to the actual ranking algorithm itself.

This explains why two users with identical interests might see completely different feeds, or why a publisher might see a sudden 50% drop in traffic without changing anything on their site. You might simply be in a “control group” for an experiment that is testing a different content mix. Because these experiments happen on Google’s servers, they are invisible to publishers, making the “Discover roller coaster” a permanent feature of the platform.

The predicted Click-Through Rate (pCTR) model is the engine behind this. Google evaluates the visual “clickability” of your content before showing it. Signals included in this evaluation are:

  • The clarity and “punchiness” of the page title.
  • The aesthetic quality and relevance of the 1200px image.
  • The historical performance of the URL (early clicks are vital).
  • The technical reliability of the page (if images fail to load, the URL is suppressed).

Personalization and User Agency

Google Discover is deeply personalized, and the research highlights how individual user actions create a feedback loop that defines their future feed. Beyond the hard blocks mentioned earlier, more subtle actions like “saves,” “follows,” and even the time spent on a page are recorded.

If a user dismisses a story (using the “X” button), that action is stored permanently for that specific URL. It will not resurface for that user, even if the article is updated. Interestingly, the research suggests that “time spent reading” is an increasingly important signal. If users click but immediately bounce back to the feed, the pCTR model adjusts downward, assuming the content was clickbait or of low quality.

Registration with the Google Publisher Center is also noted as a trust signal. While it doesn’t guarantee a “boost,” it helps Google verify the identity and category of the publisher, making the classification stage of the pipeline more accurate.

Actionable Insights for Publishers

Based on this SDK-level research, publishers can move away from “magic” solutions and focus on the technical and editorial requirements that actually move the needle in the Discover pipeline. To optimize for Discover visibility, consider the following strategy:

  • Prioritize 1200px Images: Ensure every article has a high-quality, high-resolution hero image defined in the og:image tag. Anything less is a self-imposed ceiling on your traffic.
  • Audit Your Meta Tags: Use tools to verify that your Open Graph tags are rendering correctly. Specifically, check that you aren’t using nopagereadaloud or notranslate unless absolutely necessary.
  • Monitor Early Engagement: Since pCTR is calculated early, the first hour of an article’s life is crucial. Driving initial traffic via social media or newsletters can help “prime” the Discover model by providing early positive signals.
  • Focus on Entities, Not Just Keywords: Write clearly about specific topics, people, and brands. Google’s interest matching relies on identifying these entities within your content.
  • Respect the User: Avoid aggressive clickbait that leads to dismissals or blocks. A single “Don’t show content from this site” action is a permanent loss of a potential reader and, in aggregate, can signal to Google that your domain is low-quality.
  • Understand the Decay: Don’t expect Discover traffic to last forever. Build your content calendar around the 1-7 day peak window, while maintaining a secondary stream of evergreen content that can capitalize on long-term interest matching.

Google Discover remains a complex and at times frustrating platform, but it is not entirely random. By understanding the nine-stage pipeline—from the initial crawl to the final user feedback loop—publishers can better position their content to survive the filters and thrive in the ranking process. In a world of 150 simultaneous experiments, the only true defense is consistent quality, technical precision, and a deep understanding of user interest.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top