How Google Discover qualifies, ranks, and filters content: Research
Google Discover has long been one of the most mysterious and volatile sources of traffic for digital publishers. Unlike traditional Google Search, which relies on a user entering a specific query, Discover is a proactive, “query-less” feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many news organizations and tech blogs, a single hit in Discover can result in hundreds of thousands of visits in a matter of hours, yet the mechanisms behind how content is selected have largely remained a black box. Recent SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on this system. By analyzing the observable signals within Google’s Discover app framework and telemetry data, Yesilyurt mapped out the intricate pipeline that content must navigate before it ever reaches a user’s screen. This research reveals a structured, nine-stage flow governed by strict technical requirements, predictive modeling, and aggressive filtering. Understanding this pipeline is no longer optional for SEOs; it is the blueprint for survival in a push-based content economy. The Nine-Stage Google Discover Pipeline The journey from a published article to a Discover card is not a simple linear path. Instead, it is a high-speed filtering process designed to eliminate low-quality or irrelevant content as early as possible. According to the research, the process can be broken down into nine distinct phases. First, Google must crawl and understand the content. This is the foundation of all Google products, but in Discover, the emphasis is heavily placed on semantic understanding and classification. Once crawled, the system moves to metadata extraction, where it specifically looks for key tags like the image and title. Following this, the content is classified into categories, such as “breaking news” or “evergreen,” which dictates how the system handles its “freshness” decay later on. The fourth stage is perhaps the most critical for publishers: the block check. Before any ranking or interest matching occurs, the system checks if the user or the platform has blocked the publisher. If a user has previously selected “Don’t show content from this site,” the content is discarded immediately. If the content survives the block check, it moves to interest matching, where Google’s vectors map the article’s topics to the user’s documented interests. The final stages involve server-side predictive modeling (pCTR), feed layout construction, content delivery, and the recording of user feedback. This feedback loop is continuous; if a user engages with the content, it reinforces the publisher’s standing. If they dismiss it, the system learns to suppress similar content in the future. The Power of the Publisher Block One of the most striking findings of the research is the hierarchy of filtering. Many SEOs believe that ranking factors like page speed or keyword density are the primary drivers of visibility. However, the research shows that publisher-level blocks happen long before the ranking engine even looks at your content. The “Don’t show content from this site” action is an incredibly powerful tool in the user’s hands. When a user blocks a domain, that content is suppressed across the board for that individual. There is currently no equivalent “sitewide boost” mechanism that rewards a publisher as aggressively as a block punishes them. This creates a high-stakes environment where a single piece of misleading or “clickbaity” content can lead to a permanent loss of a potential reader’s entire lifetime value on the platform. This “hard block” logic underscores the importance of brand trust. In Discover, you aren’t just competing for a click; you are competing to remain in the user’s ecosystem. If your content consistently fails to deliver on the promise of its headline, users will eventually exercise their right to block your domain, effectively “de-indexing” you from their personal feed. The Ranking Model: pCTR and Server-Side Logic Once an article passes the initial eligibility and block filters, it enters the ranking phase. The research highlights the use of a predicted click-through rate (pCTR) model. This model resides on Google’s servers and estimates the likelihood of a user clicking on a specific card based on several variables. While the internal weights of the pCTR model are not visible to the public, the SDK telemetry shows which signals the app sends to Google’s servers to inform these decisions. These include: The Page Title: Extracted primarily from the Open Graph title tag (og:title). Image Quality and Dimensions: The system checks if the image is large enough and if it has loaded successfully in the past. Content Recency: A “freshness” score is applied based on the publication timestamp. Historical Engagement: Previous click and impression data for that specific URL. Technical Reliability: Signals indicating whether images or snippets are failing to render properly. The pCTR model is dynamic. If an article begins to perform well (i.e., its actual CTR exceeds its predicted CTR), it can “trend,” causing the system to push it to a much wider audience of users with similar interest profiles. Conversely, if a story has a high impression count but very few clicks, the pCTR model will quickly deprioritize it, leading to the “traffic cliff” many publishers experience after a successful run. The Critical Role of Image Requirements In Google Discover, visuals are not just an aesthetic choice; they are a technical requirement. The research confirms that Google Discover reads specific page-level tags, with a heavy reliance on Open Graph metadata. If a page lacks a high-quality image, it is often disqualified from appearing as a prominent card, or it may not appear at all. To qualify for the high-engagement “large card” format, images must be at least 1,200 pixels wide. Content with smaller images is typically relegated to small thumbnail layouts, which have significantly lower click-through rates. Furthermore, the research indicates that Google monitors whether images load successfully. If your site has technical issues like slow-loading images or broken 404 links for metadata images, the system may filter your content out entirely to preserve the user experience of the feed. Publishers should also be aware of fallback mechanisms. If the og:title