How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Mechanism

Google Discover has transformed from a supplementary feature into one of the most significant drivers of organic traffic for publishers, often surpassing traditional search results in sheer volume. However, for many digital marketers and SEO professionals, it remains a “black box.” Unlike traditional search, where queries provide a clear roadmap for intent, Discover is a proactive, query-less feed that pushes content to users based on their interests and past behavior.

Recent SDK-level research by Metehan Yesilyurt has shed light on the inner workings of this system. By analyzing the observable signals within Google’s Discover app framework, the research reveals a structured, multi-stage pipeline that governs how content is qualified, ranked, and occasionally filtered out. This deep dive explores the technical architecture behind the feed, providing a clear view of where content succeeds and where it breaks before it even reaches a user’s screen.

The Nine-Stage Google Discover Pipeline

The research maps out a sophisticated nine-stage flow that every piece of content must navigate to appear in a user’s feed. Understanding these stages is crucial because failure at an early stage—such as the block check—prevents the content from ever reaching the ranking or delivery phases.

1. Crawling and Content Extraction

The process begins with Google’s ability to crawl and understand the page. This isn’t significantly different from traditional Google Search crawling, but the speed at which it happens is vital for Discover. The system must quickly parse the content to identify its core subject matter, entities involved, and the overall quality of the page.

2. Meta Tag Analysis

Once crawled, Google Discover prioritizes specific meta tags. Unlike standard search, which focuses heavily on header tags and keyword density, Discover relies on visual and social meta tags. The system specifically scans for Open Graph (OG) tags to understand what the user will see in their feed. If these tags are missing or poorly configured, the content may be disqualified or appear as a low-quality text link.

3. Content Classification

Google then classifies the content type. Is it a breaking news story? Is it a “how-to” guide? Is it a deep-dive evergreen piece? This classification determines which “bucket” the content falls into, which later dictates how the freshness decay model will be applied to it.

4. The Publisher Block Check

This is one of the most critical stages revealed by the research. Before any interest matching occurs, Google checks for publisher-level blocks. If a user has previously selected “Don’t show content from this site,” that domain is filtered out immediately. This happens server-side and acts as a hard gatekeeper.

5. Interest Matching

Discover is fundamentally a personalization engine. At this stage, Google matches the classified content with the user’s individual interest profile. This profile is built from search history, app usage, location history, and direct interactions with previous Discover cards.

6. Predicted Click-Through Rate (pCTR) Modeling

Once a match is found, the system applies a predictive model. Google calculates a “pCTR”—a prediction of how likely the specific user is to click on that specific card. This model takes into account the historical performance of the URL, the domain’s reputation, and the visual appeal of the metadata.

7. Feed Layout Construction

Google doesn’t just list cards; it builds a visual experience. The layout engine decides whether to show a large, high-resolution image card or a smaller thumbnail card based on the available assets and the user’s device specifications.

8. Content Delivery

The content is pushed to the user’s app. This delivery is dynamic and can happen in real-time as the user scrolls, often refreshing or reordering the feed without the user needing to manually swipe down to refresh.

9. User Feedback Loop

The final stage is the recording of the user’s reaction. Did they click? Did they ignore it? Did they dismiss the card or report it? This feedback is fed back into the pCTR model and the interest matching engine to refine future delivery.

The Power of Publisher-Level Filters

One of the standout findings of the research is the hierarchy of filtering. The publisher-level block is a definitive, sitewide action. When a user tells Google they no longer want to see content from a specific domain, the suppression is absolute for that user. There is no comparable mechanism for a sitewide “boost.” While a user can “follow” a topic or a brand, the negative signal of a block carries significantly more weight in the pipeline than a positive signal.

This highlights the danger of “clickbait” or low-quality content. While it might drive short-term clicks, if it leads to users blocking the publisher, the long-term impact is a permanent loss of visibility within that user’s feed. User dismissals are stored permanently for specific URLs, ensuring that once a user rejects a story, it never reappears.

Ranking Factors and the pCTR Model

While the exact weights of Google’s internal ranking signals remain proprietary, the research identifies the specific data points sent to the server to inform the ranking decisions. These include:

Title Optimization (og:title)

The system looks specifically at the og:title tag. If this is missing, it falls back to the Twitter title tag or the standard HTML title. The research suggests that the title used for Discover should be compelling and descriptive, but publishers must avoid being overly sensational to prevent user blocks.

Image Quality and Size

The research confirms a long-standing SEO suspicion: image size is a direct ranking factor for visibility. To qualify for the large, high-engagement cards, images must be at least 1200px wide. Smaller images are relegated to thumbnail status. Thumbnails historically receive significantly lower click-through rates, which in turn lowers the pCTR score, eventually leading to the content being phased out of the feed entirely.

Technical Health and Image Loading

The pipeline includes a check for whether images load successfully. If a page has technical issues, broken image links, or slow-loading visual assets, Google may filter the content to ensure a smooth user experience. This makes technical SEO and CDN performance vital for Discover success.

The Freshness Decay Model

Google Discover favors new content, but the research reveals the specific windows in which this “freshness boost” operates. The system groups content into four primary age categories:

1 to 7 days old: This is the peak window. Most content receives its strongest visibility boost during the first week after publication.
8 to 14 days: Visibility begins to taper off. Only highly engaging content remains in the main feed during this period.
15 to 30 days: Visibility is limited. Content usually only appears if it is highly relevant to a niche interest.
30+ days: A gradual and consistent decline. While evergreen content can resurface, the majority of standard articles will have exited the Discover ecosystem by this point.

There is a specialized classifier for “evergreen” content. If Google identifies a piece as having long-term utility (like a comprehensive guide or a historical reference), it may bypass the standard decay model and resurface the content months or even years later when a user shows a renewed interest in that specific topic.

Meta Tags That Can Kill Your Visibility

While most SEOs focus on what to add, this research highlights two specific meta tags that can act as a “kill switch” for Discover visibility:

1. “nopagereadaloud”

This tag prevents Google from using text-to-speech features on your page. Interestingly, the research indicates that having this tag can actually stop your page from entering the Discover pipeline altogether. Google prioritizes content that is accessible and compatible with its suite of assistive technologies.

2. “notranslate”

Similarly, the notranslate tag tells Google not to offer translation services for the page. Because Discover is a global product designed to bridge language gaps, content that explicitly disables translation features is often filtered out of the qualified content pool.

The Role of Personalization and User Actions

Personalization in Discover is not just about what you search for; it’s about a complex web of signals. The system integrates data from various sources:

Google’s Broader Interest Data: This includes search queries, YouTube watch history, and location data.
Publisher Center Registration: While not a direct ranking factor, being registered in the Publisher Center helps Google verify the authority and category of your site.
Direct User Interactions: Following a topic, saving a card for later, or hearting an article provides a localized boost for that user.
Engagement Metrics: The research notes that engagement signals, such as the time spent reading an article after clicking, are fed back into the system to determine if the content fulfilled the “promise” of the title and image.

Experimentation and Feed Volatility

One of the most enlightening aspects of the research is the sheer volume of experimentation happening within the Discover app. During a single observed session, the researchers noted approximately 150 server-side experiments running simultaneously. Additionally, over 50 different feature controls were active, affecting how cards were displayed and which UI elements were present.

This explains why Discover traffic can feel so volatile and unpredictable. Two users with identical interests may see completely different feeds because they are categorized into different experiment groups. For publishers, this means that a sudden drop in traffic might not always be the result of a “penalty” or a content issue; it could simply be a shift in Google’s experimental parameters.

Optimizing for the Discover Pipeline

Based on these research findings, publishers looking to stabilize and grow their Google Discover traffic should focus on a few key technical and creative pillars:

Prioritize High-Resolution Visuals

Ensure every article has a high-quality featured image that is at least 1200px wide. Use the max-image-preview:large meta tag to signal to Google that you want your content to be eligible for large-card display. High-resolution images are the single most effective way to improve pCTR.

Audit Your Meta Tags

Verify that your Open Graph tags (og:title and og:image) are correctly implemented. Remove any restrictive tags like nopagereadaloud or notranslate unless they are strictly necessary for your business model. Make sure your titles are engaging but accurately reflect the content to avoid negative user feedback.

Focus on Topical Authority

Because interest matching happens early in the pipeline, building a site that is a clear authority on specific topics helps Google classify your content more accurately. Avoid “scattershot” content strategies; instead, double down on the niches where your site already sees Discover success.

Monitor Technical Performance

Since Google checks for image loading and page health before delivery, ensure your mobile performance is top-tier. Use a robust CDN and optimize your image compression to ensure that the Discover app can reliably render your content cards.

Final Thoughts

Google Discover is not a random lottery; it is a highly engineered pipeline that prioritizes user experience, technical eligibility, and visual engagement. The research by Metehan Yesilyurt clarifies that while quality content is essential, it is only one part of the equation. Publishers must also navigate the technical requirements of the SDK, respect the freshness decay model, and avoid the pitfalls that lead to permanent user blocks.

By understanding that the system filters content before it even begins to rank it, SEOs and content creators can move away from “tricks” and focus on building a presence that the Google Discover architecture recognizes as high-quality, relevant, and technically sound. Success in Discover is about staying eligible long enough for the ranking algorithm to find your audience.