How Google Discover qualifies, ranks, and filters content: Research
How Google Discover qualifies, ranks, and filters content: Research Google Discover has become one of the most significant yet enigmatic traffic drivers for modern publishers. Unlike traditional search, which relies on user queries, Discover is a highly personalized feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many digital media outlets, a single article “going viral” on Discover can result in hundreds of thousands of visits in a matter of hours. However, this traffic is notoriously volatile and unpredictable. Recent SDK-level research conducted by Metehan Yesilyurt has pulled back the curtain on the internal architecture of Google Discover. By analyzing the observable signals within the Google app framework, this research maps out the multi-stage pipeline that dictates how content is qualified, filtered, and eventually ranked for individual users. Understanding this pipeline is essential for SEOs and content creators who want to move beyond guesswork and align their strategies with the technical realities of Google’s recommendation engine. The Nine-Stage Google Discover Pipeline The research identifies a structured, nine-stage flow that every piece of content must navigate before appearing in a user’s feed. This process is highly automated and relies on a combination of real-time classifiers and server-side models. The journey of a URL through Discover looks like this: 1. Crawling and Content Understanding The process begins with Google’s ability to find and index your content. This is not fundamentally different from standard search indexing, but for Discover, the speed of crawling is paramount. Google must understand the core topic and entities within the article almost immediately after publication to determine if it meets the criteria for “fresh” content. 2. Meta Tag Extraction Once the content is crawled, the system extracts critical metadata. This stage focuses heavily on Open Graph tags (og:title and og:image). This information is used to build the visual “card” that the user sees. If these tags are missing or improperly formatted, the content may fail to move to the next stage. 3. Content Classification Google classifies the content type. Is it a breaking news story, a “how-to” guide, or an evergreen piece of long-form journalism? These classifications help the system determine which “bucket” the content belongs in and how long its shelf life should be. 4. Block List Verification This is one of the most critical stages for publishers. Before any interest matching or ranking occurs, the system checks for blocks. If a user has previously indicated they do not want to see content from your domain, your URL is filtered out immediately. There is no opportunity to “out-rank” a publisher-level block. 5. Interest Matching The system attempts to align the content’s topic with the user’s established interests. This is based on the user’s Search history, YouTube activity, and previous interactions within the Discover feed itself. 6. Predicted Click-Through Rate (pCTR) Modeling Google applies a sophisticated, server-side pCTR model. The system evaluates how likely a specific user is to click on your card compared to other available options. This model considers historical engagement data for your domain and the specific URL. 7. Feed Layout Construction At this stage, the system decides how the feed will look. It selects which cards will be “large” (high-quality images) and which will be smaller thumbnails, ensuring a diverse and visually appealing mix of content. 8. Content Delivery The content is finally pushed to the user’s device. This happens in real-time, and the feed can be updated even while the user is actively scrolling. 9. Feedback Recording The final stage is the loop. Every action the user takes—clicking, dismissing, saving, or ignoring—is recorded and fed back into the system to refine future ranking and filtering decisions. The Silent Killers: Why Content Fails to Qualify One of the most striking findings of the research is the existence of “hard blocks” that prevent content from even entering the ranking competition. Many publishers focus on keywords and engagement, but technical oversights can disqualify a page before it ever reaches a user. Two specific meta tags can act as total suppressors: “nopagereadaloud” and “notranslate”. If these tags are detected, the system may interpret the content as restricted or unsuitable for the Discover environment, leading to an automatic exclusion. While these tags have legitimate uses for accessibility or technical reasons, their presence is a red flag for the Discover pipeline. Furthermore, image requirements are non-negotiable. Google Discover is a visual-first medium. To qualify for the large, high-engagement cards that drive the most traffic, images must be at least 1200 pixels wide. The system also requires the setting max-image-preview:large (or the use of AMP) to display these high-resolution visuals. If your images are small or fail to load correctly during the delivery stage, your visibility will be severely limited, often resulting in small thumbnail displays that suffer from significantly lower click-through rates. The Power of the Publisher-Level Block The research highlights a sobering reality for publishers: the “Don’t show content from this site” action is incredibly powerful. Because this block happens at the fourth stage of the pipeline—long before ranking models are applied—it acts as a permanent barrier between your domain and that specific user. Currently, there is no equivalent “sitewide boost” mechanism. While a user can “follow” a publisher, the research suggests that a single negative action (a dismissal or a block) carries more weight in the filtering process than a single positive action. This creates a high-stakes environment where clickbait or misleading titles might drive short-term clicks but result in long-term domain suppression if users feel deceived and choose to block the source. The Freshness Decay: Understanding the Visibility Windows Time is the most influential factor in Discover visibility. Unlike traditional search, where a high-quality guide can remain at the top of the SERPs for years, Discover content has a distinct and rapid decay cycle. The research identifies four primary windows of visibility: 1 to 7 days: This is the “golden window.” Freshly published content receives the strongest boost and the highest likelihood of appearing in the top positions