How Google Discover qualifies, ranks, and filters content: Research
The Mechanics of Google Discover: A Deep Dive into Content Qualification and Ranking Google Discover has become one of the most significant yet unpredictable drivers of organic traffic for modern publishers. Unlike traditional search, where users enter specific queries, Discover is a push-based system that anticipates user interests. For years, the inner workings of this feed have remained a “black box” for SEOs and digital marketers. However, recent SDK-level research by Metehan Yesilyurt has provided an unprecedented look at the structured, multi-stage pipeline Google uses to qualify, rank, and filter content before it ever reaches a user’s screen. Understanding this architecture is vital for any digital publisher looking to stabilize their traffic. The research reveals that Discover is not just a simplified version of Google Search; it is a complex ecosystem governed by hard publisher blocks, strict technical requirements, rapid freshness decay, and a massive layer of ongoing experimentation. By dissecting the nine-stage flow of content through the Discover pipeline, we can better understand why some articles go viral while others never gain traction. The Nine-Stage Content Pipeline Google Discover operates through a sophisticated delivery framework that filters out noise and prioritizes high-quality, relevant content. According to the research, content passes through nine distinct stages. Failure at any single point can lead to a complete lack of visibility. The process begins with crawling and understanding. Google’s bots must first find the content and parse its meaning. This is followed by reading meta tags, where the system identifies critical elements like the title and the lead image. The third stage involves classification, where the system determines if the content is breaking news, a feature story, or evergreen content. Crucially, the fourth stage is a block check. Before any ranking or interest matching occurs, Google determines if the publisher or the specific URL has been blocked by the user or flagged for policy violations. If the content passes this gate, it moves to interest matching, where it is paired with users based on their browsing history and preferences. The final stages involve a server-side click-through rate (pCTR) prediction model, feed layout construction, delivery, and finally, the collection of user feedback. This feedback loop ensures that the system learns from every swipe, click, or dismissal, refining future recommendations in real-time. The Pre-Ranking Filter: Why Some Publishers Are Invisible One of the most striking findings of the research is that publisher-level blocks happen remarkably early in the process. In traditional Search, a user might see a site they dislike and simply choose not to click. In Discover, a single user action can have a devastating impact on a domain’s visibility for that individual. When a user selects “Don’t show content from [Site Name],” that publisher is effectively removed from the user’s Discover ecosystem. This block is processed before the ranking model even looks at the content. This means that even if you produce the most relevant, high-quality article for a specific user interest, you will never reach them if you have been previously blocked. Unlike many social media algorithms, there is no equivalent “sitewide boost” mechanism to counter these blocks. The system is designed to be highly sensitive to negative user signals to maintain the quality of the personal feed. The Power of User Dismissals User feedback in Discover is often permanent. If a user dismisses a specific story, that action is recorded on Google’s servers and linked to that URL. The research indicates that once a story is dismissed, it will not resurface for that user, regardless of how much engagement it receives from others. This puts immense pressure on publishers to ensure that their headlines and images are not just enticing, but also highly accurate to avoid the “click-and-dismiss” behavior that can signal low quality to the algorithm. The pCTR Model: The Brain of Discover Ranking Ranking in Google Discover is heavily influenced by a Predicted Click-Through Rate (pCTR) model. This is an AI-driven evaluation that happens on Google’s servers, estimating the likelihood of a specific user engaging with a specific piece of content. While the model itself is not visible to the public, the signals sent to the model have been mapped. The pCTR model considers several key factors: The og:title: The primary headline used for the card. Image Quality and Dimensions: High-resolution visuals are prioritized. Freshness: How recently the content was published or updated. Historical Performance: Previous click and impression data for the URL and the domain. Technical Stability: Whether the images load correctly and the page is mobile-friendly. If the pCTR model predicts a low engagement rate, the content will be pushed down in the feed or omitted entirely in favor of content with a higher predicted success rate. This creates a “rich get richer” scenario where high-performing content continues to gain momentum while underperforming content is quickly phased out. The Crucial Role of Image Quality and Meta Tags Technical SEO in Discover is heavily reliant on Open Graph (OG) tags. The research highlights six key page-level tags that Google Discover reads to build the visual cards. The most critical of these is the og:image. Without a valid image, a page will typically not appear in the Discover feed at all. The 1200px Requirement To qualify for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1200 pixels wide. While smaller images may still be eligible for Discover, they are often relegated to small thumbnails. These thumbnails have significantly lower click-through rates compared to large-form cards. For publishers, ensuring that every article has a high-quality, 1200px-wide featured image is perhaps the single most effective technical optimization for Discover. Backup Tags and Metadata Google’s system is designed to be resilient. If the og:title tag is missing, the system will look for backups, such as the Twitter title tag or the standard HTML title tag. However, relying on backups is risky, as it may not present the content in the most optimized way. Furthermore, specific meta tags can act as “kill switches.” The