How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline

Google Discover has long been considered a “black box” for digital publishers and SEO professionals. Unlike traditional search, which relies on active queries, Discover is a push-based system that delivers content to users based on their interests, browsing history, and behavioral patterns. Because of this, traffic from Discover can be massive yet frustratingly volatile.

Recent SDK-level research conducted by Metehan Yesilyurt has shed new light on the inner workings of the Google Discover framework. By analyzing the observable signals within the app’s infrastructure, Yesilyurt mapped out a complex, nine-stage pipeline that dictates how content is qualified, ranked, and occasionally filtered out before it ever reaches a user’s screen. For publishers looking to stabilize their traffic, understanding this architecture is essential.

The Nine Stages of Google Discover Content Processing

The journey from a published article to a prominent spot in a user’s Discover feed involves several sophisticated layers of filtering and evaluation. The research identifies a structured flow that every piece of content must navigate:

1. Crawling and Understanding

The process begins with Google’s standard crawling mechanism. Googlebot must be able to access and index the page. During this stage, Google’s systems analyze the semantic meaning of the content, determining the primary topics, entities, and categories the article covers.

2. Meta Tag Extraction

Google Discover relies heavily on structured metadata. The system specifically scans for Open Graph tags and other key identifiers to determine how the content should be presented visually. If these tags are missing or malformed, the content may be disqualified from high-visibility formats.

3. Content Classification

Google classifies content into specific types, such as “Breaking News,” “Evergreen,” or “Special Interest.” This classification dictates the lifespan of the content within the feed and how aggressively it is pushed to various user segments.

4. The Publisher Block Check

One of the most critical findings of the research is the existence of a hard “block” stage. Before the system even considers if a piece of content is interesting to a user, it checks if that user has previously blocked the publisher. If a user has selected “Don’t show stories from [Publisher],” the content is instantly discarded from the candidate pool.

5. Interest Matching

This is the personalization layer. Google matches the classified topics of the article against the user’s “Interest Graph.” This graph is built from search history, app usage, and explicit “follows” within the Google ecosystem.

6. Predicted Click-Through Rate (pCTR) Modeling

Before the feed is rendered, Google runs a server-side prediction model. This model estimates the likelihood of a specific user clicking on a specific card. High pCTR scores move content higher in the priority list, while low scores may result in the content being buried or omitted.

7. Feed Layout Construction

Google doesn’t just list articles; it builds a visual experience. This stage determines whether an article gets a large, high-impact card or a smaller thumbnail, based on the quality of the assets and the predicted engagement.

8. Content Delivery

The content is pushed to the user’s device. This happens dynamically, and as the research shows, the feed can update in real-time without the user needing to manually refresh the app.

9. Feedback Recording

The cycle closes with user feedback. Every click, scroll-past, dismissal, or “heart” is recorded and fed back into the ranking model to refine future recommendations.

The Power of the Publisher Block

For many publishers, the most startling revelation in the research is the mechanics of the publisher-level block. In Google Discover, a user has the option to stop seeing content from a specific domain entirely. According to the SDK analysis, this block occurs very early in the pipeline—before interest matching and before ranking.

Unlike Google Search, where a site might rank lower but still appear for specific queries, a block in Discover is a total suppression of the domain for that specific user. There is currently no equivalent “sitewide boost” mechanism that functions with the same level of permanence. This means that maintaining a positive reputation with your audience is vital; a few pieces of “clickbait” that annoy users into blocking your domain can permanently erode your Discover reach.

The Predicted Click-Through Rate (pCTR) Model

While SEOs often focus on keywords, Discover ranking is heavily influenced by a predicted click-through rate (pCTR) model. This model is housed on Google’s servers and acts as a gatekeeper for visibility. While the model’s exact weights are proprietary, the research highlights the signals sent to Google to inform these predictions:

  • Page Title: Primarily pulled from the og:title tag.
  • Image Quality: The system checks if the image is high-resolution and if it loads correctly.
  • Freshness: The time elapsed since publication.
  • Historical Performance: Past click and impression data for that specific URL and domain.
  • Asset Integrity: Whether the images and meta tags are technically sound.

This explains why two articles on the same topic might have vastly different performance metrics. If Google’s model predicts that Article A will garner a 10% CTR and Article B will garner 2%, Article A will receive the lion’s share of impressions.

Freshness and the Lifecycle of a Discover Post

Timing is everything in Google Discover. The research confirms that Google utilizes “freshness decay” to ensure the feed stays relevant. The visibility of content typically follows a specific window of decay:

1 to 7 Days: The Peak Performance Window

New content receives the strongest boost. Most “viral” Discover traffic occurs within the first 48 to 72 hours of publication. During this time, the freshness signal is at its strongest, allowing the content to reach the widest possible audience.

8 to 14 Days: Moderate Visibility

After the first week, content begins to see a significant drop in impressions unless it is consistently achieving an exceptionally high CTR. At this stage, it is often relegated to “Suggested for You” sections rather than the primary “Top Stories” area.

15 to 30 Days: Limited Visibility

By the third week, content visibility becomes highly restricted. Only content that has been classified as high-value evergreen or that is seeing a resurgence in search interest tends to survive in the feed during this period.

30+ Days: Gradual Decline

Beyond a month, content rarely appears in Discover unless it is specifically relevant to a recurring seasonal event or a niche interest with very little new content available. However, the research notes that there is a separate classification for “Evergreen” content that allows certain high-performing pieces to bypass these standard decay rules.

The 1200px Rule: Image and Meta Tag Requirements

Technical SEO for Google Discover is largely centered around visual assets and Open Graph metadata. The research identifies six key page-level tags that Google Discover reads, with the og:image and og:title being the most critical. If an article lacks an image, it simply will not appear as a card in Discover.

The “1200px rule” is perhaps the most actionable technical requirement. To qualify for large, prominent cards—which typically see significantly higher click-through rates—images must be at least 1,200 pixels wide. If a publisher provides only smaller images, Google will default to a small thumbnail format. In a feed where visual appeal is the primary driver of engagement, being relegated to a thumbnail is a major competitive disadvantage.

Furthermore, the research found that Google has built-in redundancy. If the og:title is missing, the system will attempt to use the twitter:title or the standard HTML <title> tag. However, relying on backups is risky, as it can lead to inconsistent presentation.

Crucially, two specific meta tags can act as “kill switches” for Discover visibility:

  • nopagereadaloud: This tag can prevent the content from entering the Discover ecosystem in certain regions or formats.
  • notranslate: If Google’s system cannot apply automated translation or processing to the page, it may be excluded from feeds in non-native language markets.

Personalization Layers and Real-Time Feedback

Google Discover is not a static list; it is a living feed that reacts to user behavior in real-time. The personalization layer draws from several data points:

  • Google Interest Data: Broader user behavior across Google Search, YouTube, and Maps.
  • Publisher Center: While not strictly required, registration in the Google Publisher Center helps Google understand the relationship between a site and its topical authority.
  • Direct User Actions: Following a topic, saving an article for later, or dismissing a card.
  • Dwell Time: Though often debated in SEO, the research suggests that engagement signals—such as how much time a user spends reading after clicking—are fed back into the ranking algorithm.

The SDK analysis revealed that user dismissals are permanent for specific URLs. If a user swipes a story away or clicks “Not interested in this,” that specific URL will never resurface for that user, and similar content from that publisher may be deprioritized in the short term.

A State of Constant Change: Experiments and Volatility

One of the most revealing aspects of the research is the sheer scale of experimentation occurring within the Discover app. During a single observed session, researchers found approximately 150 server-side experiments running simultaneously. Additionally, more than 50 feature controls were active, affecting how cards were displayed, how images were cropped, and how labels were applied.

This level of experimentation explains why Google Discover traffic is so notoriously volatile. Two users with identical interests and browsing histories might see completely different layouts or content mixes because they have been assigned to different experimental groups. For publishers, this means that a sudden drop in traffic might not be due to a penalty or a content quality issue, but rather a shift in the UI/UX experiments Google is conducting.

Actionable Takeaways for Publishers and SEOs

The findings from this research provide a roadmap for optimizing for Google Discover. To maximize visibility and minimize the risk of being filtered out, publishers should focus on several key areas:

Prioritize High-Resolution Visuals

Ensure every article has an og:image that is at least 1200px wide. Use compelling, high-quality photography rather than generic stock images. Since Discover is a visual medium, the image is often more important than the headline in securing the initial click.

Optimize Open Graph Tags

Don’t leave your Discover presentation to chance. Manually craft your og:title to be engaging without being misleading. Avoid clickbait that leads to high bounce rates, as the feedback loop will quickly punish content that fails to deliver on its promise.

Monitor Publisher Health

Since publisher-level blocks are so damaging, monitor your audience sentiment. If you notice a sharp, sustained decline in Discover traffic, it may be a sign that your content strategy is alienating users, leading to widespread “Don’t show stories from this site” actions.

Leverage the Freshness Window

Because visibility decays rapidly after seven days, a consistent publishing schedule is vital for maintaining Discover traffic. However, for older content that remains relevant, consider updating the article with new information and a new “last modified” date to potentially trigger a freshness re-evaluation.

Technical Auditing

Ensure your site is free of tags like nopagereadaloud or notranslate unless they are strictly necessary. Verify that your images load quickly and reliably; the SDK analysis suggests that image loading failures are a signal used to deprioritize content.

The Future of Discovery

The research into the Google Discover architecture highlights a shift away from traditional keyword-based SEO toward a more holistic, user-centric model. In this environment, technical excellence—specifically in metadata and asset delivery—must be paired with a deep understanding of audience interests.

Google Discover remains a powerful tool for driving massive scale, but it requires a different mindset than Search. Success in Discover is about being “eligible” (through technical tags), “trustworthy” (avoiding blocks), and “engaging” (optimizing pCTR). As Google continues to run hundreds of simultaneous experiments, the publishers who thrive will be those who focus on these fundamental pillars of the Discover pipeline.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top