Author name: aftabkhannewemail@gmail.com

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

The Mechanics of Google Discover: A Deep Dive into Content Qualification and Ranking Google Discover has become one of the most significant yet unpredictable drivers of organic traffic for modern publishers. Unlike traditional search, where users enter specific queries, Discover is a push-based system that anticipates user interests. For years, the inner workings of this feed have remained a “black box” for SEOs and digital marketers. However, recent SDK-level research by Metehan Yesilyurt has provided an unprecedented look at the structured, multi-stage pipeline Google uses to qualify, rank, and filter content before it ever reaches a user’s screen. Understanding this architecture is vital for any digital publisher looking to stabilize their traffic. The research reveals that Discover is not just a simplified version of Google Search; it is a complex ecosystem governed by hard publisher blocks, strict technical requirements, rapid freshness decay, and a massive layer of ongoing experimentation. By dissecting the nine-stage flow of content through the Discover pipeline, we can better understand why some articles go viral while others never gain traction. The Nine-Stage Content Pipeline Google Discover operates through a sophisticated delivery framework that filters out noise and prioritizes high-quality, relevant content. According to the research, content passes through nine distinct stages. Failure at any single point can lead to a complete lack of visibility. The process begins with crawling and understanding. Google’s bots must first find the content and parse its meaning. This is followed by reading meta tags, where the system identifies critical elements like the title and the lead image. The third stage involves classification, where the system determines if the content is breaking news, a feature story, or evergreen content. Crucially, the fourth stage is a block check. Before any ranking or interest matching occurs, Google determines if the publisher or the specific URL has been blocked by the user or flagged for policy violations. If the content passes this gate, it moves to interest matching, where it is paired with users based on their browsing history and preferences. The final stages involve a server-side click-through rate (pCTR) prediction model, feed layout construction, delivery, and finally, the collection of user feedback. This feedback loop ensures that the system learns from every swipe, click, or dismissal, refining future recommendations in real-time. The Pre-Ranking Filter: Why Some Publishers Are Invisible One of the most striking findings of the research is that publisher-level blocks happen remarkably early in the process. In traditional Search, a user might see a site they dislike and simply choose not to click. In Discover, a single user action can have a devastating impact on a domain’s visibility for that individual. When a user selects “Don’t show content from [Site Name],” that publisher is effectively removed from the user’s Discover ecosystem. This block is processed before the ranking model even looks at the content. This means that even if you produce the most relevant, high-quality article for a specific user interest, you will never reach them if you have been previously blocked. Unlike many social media algorithms, there is no equivalent “sitewide boost” mechanism to counter these blocks. The system is designed to be highly sensitive to negative user signals to maintain the quality of the personal feed. The Power of User Dismissals User feedback in Discover is often permanent. If a user dismisses a specific story, that action is recorded on Google’s servers and linked to that URL. The research indicates that once a story is dismissed, it will not resurface for that user, regardless of how much engagement it receives from others. This puts immense pressure on publishers to ensure that their headlines and images are not just enticing, but also highly accurate to avoid the “click-and-dismiss” behavior that can signal low quality to the algorithm. The pCTR Model: The Brain of Discover Ranking Ranking in Google Discover is heavily influenced by a Predicted Click-Through Rate (pCTR) model. This is an AI-driven evaluation that happens on Google’s servers, estimating the likelihood of a specific user engaging with a specific piece of content. While the model itself is not visible to the public, the signals sent to the model have been mapped. The pCTR model considers several key factors: The og:title: The primary headline used for the card. Image Quality and Dimensions: High-resolution visuals are prioritized. Freshness: How recently the content was published or updated. Historical Performance: Previous click and impression data for the URL and the domain. Technical Stability: Whether the images load correctly and the page is mobile-friendly. If the pCTR model predicts a low engagement rate, the content will be pushed down in the feed or omitted entirely in favor of content with a higher predicted success rate. This creates a “rich get richer” scenario where high-performing content continues to gain momentum while underperforming content is quickly phased out. The Crucial Role of Image Quality and Meta Tags Technical SEO in Discover is heavily reliant on Open Graph (OG) tags. The research highlights six key page-level tags that Google Discover reads to build the visual cards. The most critical of these is the og:image. Without a valid image, a page will typically not appear in the Discover feed at all. The 1200px Requirement To qualify for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1200 pixels wide. While smaller images may still be eligible for Discover, they are often relegated to small thumbnails. These thumbnails have significantly lower click-through rates compared to large-form cards. For publishers, ensuring that every article has a high-quality, 1200px-wide featured image is perhaps the single most effective technical optimization for Discover. Backup Tags and Metadata Google’s system is designed to be resilient. If the og:title tag is missing, the system will look for backups, such as the Twitter title tag or the standard HTML title tag. However, relying on backups is risky, as it may not present the content in the most optimized way. Furthermore, specific meta tags can act as “kill switches.” The

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline For many digital publishers, Google Discover represents a volatile yet indispensable source of organic traffic. Unlike traditional Search, which relies on active queries, Discover is an interest-based feed that pushes content to users before they even know they want it. However, the mechanics behind why one article goes viral while another fails to gain a single impression have long remained a “black box” for SEOs. Recent SDK-level research by Metehan Yesilyurt has pulled back the curtain on this mysterious system. By analyzing the observable signals within Google’s Discover app framework, the research reveals a highly structured, multi-stage pipeline. This system isn’t just about what a user likes; it involves hard publisher blocks, strict technical requirements, freshness decay, and a massive layer of server-side experimentation. Understanding this architecture is critical for any site looking to stabilize its Discover traffic. If your content is failing to surface, the issue might not be your topic—it might be a failure at one of the early qualification stages that happens before ranking even begins. The Nine-Stage Flow of Content Discovery The journey from a published article to a user’s mobile feed involves nine distinct stages. Each stage acts as a filter or a processor, ensuring that only the most relevant and high-quality content reaches the end user. 1. Crawl and Analysis: Google must first discover and crawl the page. During this phase, the system builds an initial understanding of the content’s topic and entity relationships. 2. Metadata Extraction: The system reads key meta tags, specifically focusing on the Open Graph (OG) tags for titles and images. 3. Content Classification: Content is categorized by type. This determines if the piece is “Breaking News,” “Evergreen,” or “Niche Interest.” 4. Block Filtering: The system checks if the publisher or the specific URL has been blocked by the user or by Google’s internal safety filters. 5. Interest Matching: Google matches the content’s entities and topics against the user’s historical browsing data, search history, and app activity. 6. pCTR Prediction: A server-side model estimates the Predicted Click-Through Rate. This is a crucial gatekeeper for visibility. 7. Feed Construction: The layout is built, determining whether the content appears as a large hero card or a smaller thumbnail. 8. Delivery: The content is pushed to the user’s device. 9. Feedback Loop: The system records how the user interacts with the card—whether they click, dismiss, or ignore it. The Barrier to Entry: Publisher Blocks and Filters One of the most significant findings of the research is the placement of the publisher block in the pipeline. In Google Discover, a user has the option to select “Don’t show content from [Site Name].” According to the SDK analysis, this block happens at a foundational level, well before interest matching or ranking occurs. If a user has blocked your domain, your content is effectively dead to them. There is no amount of “SEO optimization” or high-quality reporting that can bypass this. Furthermore, while a user can suppress an entire domain with one click, there is no equivalent “boost” mechanism. Following a site helps, but the negative signal of a block is far more powerful and permanent in the system’s logic. This highlights the importance of maintaining brand trust. If a publisher consistently uses “clickbait” headlines that frustrate users, they risk being blocked at the domain level, which leads to a permanent decline in Discover reach for that specific user and potentially influences the broader algorithmic perception of the site. The Technical Architecture: Meta Tags and Image Standards Google Discover is a visual-first medium. The research confirms that the app framework looks for six specific page-level tags to generate a card. The most critical of these are og:image and og:title. If these tags are missing, the system searches for fallbacks like Twitter title tags or the standard HTML title. If no suitable image is found, the content is often disqualified from appearing entirely. Image quality is a primary ranking factor for visibility. To qualify for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1,200 pixels wide. Small images or thumbnails are not only less visually appealing but are also deprioritized by the layout engine. Additionally, certain technical meta tags can act as a “kill switch” for Discover eligibility. The tags “nopagereadaloud” and “notranslate” have been observed to stop pages from entering the Discover pipeline. While these tags are often used for specific accessibility or regional reasons, publishers should be aware that their inclusion may inadvertently choke off Discover traffic. Ranking and the pCTR Model Ranking in Discover is not a static score but a predictive calculation. Google utilizes a Predicted Click-Through Rate (pCTR) model on its servers. This model estimates the likelihood of a specific user clicking on a specific story based on several variables: User Engagement History: Has this user clicked on similar topics or this specific publisher before? Article Title: The og:title is heavily scrutinized for relevance and “clickability.” Image Success: Does the image load quickly and meet quality standards? URL Performance: The past click and impression data for that specific URL. If a story starts strong with a high CTR, the model is more likely to broaden its distribution. The pCTR model is hidden from the user and the publisher, but the telemetry data shows that these signals are transmitted to Google’s servers before any ranking decision is finalized. This confirms that early engagement is vital; if the first group of users who see the content doesn’t interact with it, the pCTR score drops, and the content is phased out of the feed. The Decay of Freshness: Timing Your Content Freshness is perhaps the most aggressive filter in the Discover ecosystem. While Search can surface content from years ago, Discover favors the “now.” The research identifies specific time windows that dictate how content is treated: 1 to 7 Days: This is the “golden window.” Content in this age range receives the strongest visibility boost. This is where news and trending

Uncategorized

Google to change budget pacing for campaigns using ad scheduling

Understanding the Shift in Google Ads Budget Pacing Google has officially announced a significant structural change to how budget pacing functions for campaigns utilizing ad scheduling. This update, set to take effect on March 1, 2026, represents a fundamental shift in the relationship between daily budgets, monthly caps, and the hours an advertiser chooses to remain active. For digital marketers, PPC specialists, and business owners, this change is not merely a technical adjustment; it is a shift that could potentially double monthly ad spend for certain campaign types without any manual increase in daily budget settings. Currently, many advertisers use ad scheduling as a secondary method of budget control. By restricting ads to only run during business hours or specific high-converting days, they have historically seen their total monthly spend stay well below the theoretical maximum of their daily budget multiplied by the average number of days in a month. Starting in early 2026, Google Ads will begin proactively pacing these budgets to reach the full monthly billing limit, regardless of how many days or hours the campaign is actually scheduled to run. This means the platform will prioritize exhausting the “authorized” monthly budget within the compressed timeframe provided by the advertiser. The Mechanics of Google’s Monthly Billing Limit To understand why this change is so impactful, one must first understand how Google Ads handles budgets on a monthly scale. Google uses a standard calculation to determine a campaign’s monthly spending limit: the average daily budget multiplied by 30.4. This number, 30.4, represents the average number of days in a month (365 days divided by 12 months). Under existing rules, Google allows a campaign to spend up to two times its average daily budget on any given day to account for fluctuations in search traffic and high-value opportunities. However, the platform guarantees that over the course of a billing cycle, an advertiser will never be charged more than the 30.4x daily budget limit. Up until now, campaigns with limited schedules—such as those only running on weekends or during a 9-to-5 window—effectively bypassed this monthly cap because there simply wasn’t enough time in the “active” window for Google to spend the full theoretical monthly amount. The new update removes this “protection,” allowing the algorithm to push spend more aggressively during those active hours to hit the 30.4x ceiling. How the New Pacing Logic Functions Starting March 1, 2026, the logic behind budget distribution will become much more aggressive for flighted or scheduled campaigns. While the fundamental rules of Google Ads billing remain the same, the application of those rules is being recalibrated. Here is a breakdown of what will and will not change: The 2x Daily Overspend Rule The long-standing rule that allows Google to spend up to 200% of your daily budget on a single day remains in effect. If your budget is $100 per day, Google can still spend $200 on a Tuesday if the traffic quality is high. What changes is how often Google will aim for that 2x ceiling. Previously, the system might have been more conservative. Moving forward, if you have a limited schedule, the system will actively seek to spend that 2x maximum every day you are active until the monthly cap is reached. The 30.4x Monthly Cap The total amount you can be billed in a month (Daily Budget x 30.4) will not change. This is the “safety net” that prevents runaway spending. However, the gap between your *actual* spend and your *maximum* spend is likely to close significantly for scheduled campaigns. Ad Scheduling Integrity Crucially, Google has stated that campaigns will not run outside of their designated scheduled hours. If you have set your ads to only run on Saturdays and Sundays, they will stay dark from Monday through Friday. The change is not about *when* the ads run, but how *intensely* they spend during the hours they are live. The Financial Impact: A Comparative Example To visualize the impact of this change, consider a localized service business that only runs ads on weekends (Saturday and Sunday) to capture weekend demand. Let’s assume the campaign has an average daily budget of $100. Under the current (old) system, there are roughly eight weekend days in a standard month. Even if Google occasionally spends a bit over the $100 mark, the campaign would likely spend around $800 for the month. Because the campaign is “off” for the other 22 days, the system doesn’t try to “make up” the missed spend from those inactive days. Under the new system starting in 2026, Google views that $100 daily budget as an authorization to spend up to $3,040 per month ($100 x 30.4). Because the advertiser has restricted the schedule to only 8 days, Google will now attempt to spend as much of that $3,040 as possible during those 8 days. Since it can spend up to 2x the daily budget ($200) per day, the campaign could now spend $1,600 per month (8 days x $200). In this scenario, the advertiser’s monthly bill has effectively doubled without them ever touching their daily budget setting. Why Google is Implementing This Change According to Google Ads Liaison Ginny Marvin, the primary objective of this update is to align campaign pacing with advertiser expectations regarding monthly spend limits. From Google’s perspective, when an advertiser sets a daily budget, they are signaling a willingness to spend the corresponding monthly total. By pacing more aggressively within the permitted schedule, Google believes it is helping advertisers maximize their reach and conversion potential during the times they have deemed most important. Furthermore, this update reflects Google’s broader move toward automated, objective-based bidding. Modern AI-driven bidding strategies, such as Target ROAS (Return on Ad Spend) or Maximize Conversions, perform best when they have the flexibility to capture high-intent traffic whenever it appears. By allowing for more aggressive pacing within a limited schedule, Google’s algorithms have more “room” to bid competitively for the best auctions during those active hours. Strategic Implications for PPC Professionals This change

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

The Shifting Paradigm of Digital Discovery For over two decades, the blueprint for digital growth was relatively straightforward: optimize for traditional search engines, manage your social media presence, and run targeted paid campaigns. However, the emergence of Large Language Models (LLMs) has introduced a new variable into the equation. As users migrate from traditional search bars to conversational interfaces like ChatGPT, Claude, and Perplexity, the way traffic flows across the web is undergoing a fundamental transformation. The core question for digital marketers and brand owners is no longer whether LLMs will impact their traffic, but rather how that impact is manifesting in real-time. To move beyond speculation and anecdotal evidence, it is essential to look at hard data. By examining a comprehensive dataset of LLM prompt referral traffic—spanning from January 1, 2025, to February 7, 2026—we can gain a clearer understanding of how these models are influencing the modern user journey. Over these 13 months, the data reveals a complex narrative. While LLM-driven traffic is currently a small piece of the overall pie, its trajectory, quality, and the nature of its citations suggest that we are in the early stages of a significant shift in how consumers discover brands and make purchasing decisions. Understanding the Scale: LLM Referral Traffic is Still Small Despite the massive cultural footprint of AI, the current volume of referral traffic from LLMs remains a relatively niche segment of overall website visitors. Across a broad dataset of brand sites, LLM referral traffic accounts for less than 2% of total referral traffic on average. To put this into perspective, for every 100 visitors arriving at a site via a referral link, fewer than two are coming directly from an LLM response. The data shows a range of 0.15% to 1.5% for most brands. This includes traffic from major players like ChatGPT, Perplexity, Gemini, and Claude. For digital strategists, this finding is a crucial reality check. While the industry is dominated by discussions about AI Search Optimization (ASO) and the “death of SEO,” the reality is that traditional search engines and direct traffic still drive the vast majority of digital volume. However, dismissing this traffic due to its size would be a tactical error. In the early days of mobile browsing or social media marketing, the initial traffic volumes were similarly small. The significance lies not in the current volume, but in the behavior of these users and the speed at which the channel is expanding. Velocity and Momentum: LLM Traffic is Growing Fast While the volume is low, the growth rate is aggressive. When comparing the first half of 2025 to the second half, the dataset shows an average growth rate of 80% in LLM referral traffic. This acceleration indicates that as these models become more integrated into browsers and mobile devices, user habits are shifting toward conversational discovery. The growth across different companies is not uniform. While some organizations saw a modest 10% increase, others experienced explosive growth of up to 300%. This disparity often depends on the industry, the type of content the brand produces, and how frequently that content is cited as an authoritative source by the LLMs. Looking at the aggregate data from January to December of 2025, referral traffic grew threefold. This steady, month-over-month increase suggests that LLM usage is not a fleeting trend but a growing component of the digital ecosystem. For brands, this means that monitoring “velocity” is just as important as monitoring “volume.” A channel that grows by 80% in six months demands a different strategic approach than a stagnant or slow-growing channel. We are witnessing the early stages of an adoption curve that could eventually rival traditional search for specific types of high-intent queries. The Volatility of Prompt Algorithms Part of what drives this growth—and its inherent unpredictability—is the constant evolution of prompt algorithms. Companies like OpenAI, Google, and Anthropic are frequently updating how their models browse the web, select sources, and present links to users. A single update to how Gemini cites news sources or how ChatGPT Search prioritizes product reviews can lead to dramatic swings in referral traffic overnight. Monitoring this data through third-party tools is essential because LLMs themselves do not yet provide the granular referral data that marketers have come to expect from platforms like Google Search Console. The Evolution of Citations: Shifting Source Preferences One of the most fascinating aspects of the 13-month data set is the change in which sources LLMs choose to cite. Unlike traditional search, which relies on a relatively stable set of ranking factors, LLMs are dynamic in their source selection. Since September 2025, the monitoring of over 5,000 prompts and responses across Gemini, ChatGPT, and Perplexity has shown a distinct shift in the types of content being prioritized. The Rise of Video and Social Proof Recently, there has been a noticeable spike in YouTube links and citations within LLM responses. This suggests that models are increasingly leaning on video content to provide visual demonstrations, tutorials, and reviews. For brands, this indicates that a multi-channel content strategy—one that includes high-quality video—is becoming a prerequisite for being “found” by AI. Similarly, Reddit saw a significant period of growth as a cited source. For a period, LLMs heavily favored the “human” and “conversational” nature of Reddit threads to answer subjective or experience-based questions. While this growth has recently leveled off, the initial surge highlights how much LLMs value community-driven data. These shifts in citations directly impact the traffic reaching your site. If an LLM cites a Reddit thread that mentions your product, the user might visit Reddit first, making the path to your website more circuitous. Implications for Content Strategy These shifting citations mean that brands cannot afford to focus solely on their owned domains. To capture LLM traffic, you must be present where the LLMs are looking. This includes participating in relevant industry forums, maintaining a robust YouTube presence, and ensuring that your brand is mentioned in the authoritative third-party sources that LLMs trust. The data proves that LLMs

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Mechanism For digital publishers and SEO professionals, Google Discover has long been something of a “black box.” Unlike traditional Google Search, which relies on explicit user queries, Discover is a highly personalized, query-less feed that anticipates what a user wants to see based on their past behavior, interests, and engagement patterns. Because of this proactive nature, Discover can drive massive surges in traffic—often referred to as “spikes”—that can dwarf traditional search traffic overnight. However, this traffic is notoriously volatile. One day a site may receive hundreds of thousands of clicks, and the next, it may see nothing. New SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on the technical architecture of Google Discover. By analyzing the signals within the Google app framework, this research reveals a structured, nine-stage pipeline that dictates how content is qualified, ranked, and ultimately filtered before it ever reaches a user’s screen. The Nine-Stage Content Pipeline The journey of an article from a publisher’s CMS to a user’s Google Discover feed is not instantaneous. It follows a rigorous technical workflow designed to ensure quality, relevance, and safety. Understanding these stages is essential for diagnosing why certain content fails to surface. 1. Crawling and Extraction The process begins with Google’s standard crawling infrastructure. Before a piece of content can be considered for Discover, Google must first discover the URL and parse its content. This is why standard SEO practices, such as maintaining a healthy XML sitemap and ensuring fast crawlability, remain the foundation of Discover success. 2. Meta Tag Analysis Once crawled, the system focuses on specific meta tags. Unlike Search, which relies heavily on the HTML title tag and meta description, Discover prioritizes Open Graph (OG) tags. The research indicates that Discover specifically looks for “og:title” and “og:image.” If these are missing, the system searches for fallbacks like Twitter cards or the standard HTML title, but the absence of high-quality meta data can lead to a failure in the next stages. 3. Content Classification Google classifies the content into distinct buckets. Is it breaking news? Is it an evergreen “how-to” guide? Is it a product review? This classification determines which “freshness” rules apply and which user interest groups the content will be tested against. 4. The Eligibility Gate (Publisher Blocks) This is one of the most critical findings of the research. Before the system even attempts to match content to a user’s interests, it checks for publisher-level blocks. If a user has previously selected “Don’t show content from this site,” that publisher is effectively dead to that user. This filter happens server-side and is a hard block that prevents the content from even entering the ranking pool. 5. Interest Matching Google compares the classified topics of the article against the user’s “Interest Graph.” This graph is built from search history, YouTube views, and previous interactions within the Discover feed itself. If there is no topical alignment, the content proceeds no further. 6. The pCTR (Predicted Click-Through Rate) Model The ranking isn’t just about what you like; it’s about what Google thinks you are most likely to click. Using a sophisticated server-side model, Google assigns a predicted CTR (pCTR) to each eligible article. This model weighs the headline’s “clickability,” the visual appeal of the image, and the historical performance of similar topics. 7. Layout Construction Once the top-ranking articles are selected, the system decides how to display them. This includes choosing between large hero cards or smaller thumbnail cards. The research notes that this stage is heavily influenced by image dimensions. 8. Delivery The content is pushed to the user’s device. Interestingly, the feed is not static. It can update in real-time, adding or reordering cards while the user is actively scrolling, without requiring a manual refresh. 9. The Feedback Loop Every action the user takes—clicking, Dismissing, reporting, or sharing—is recorded. This data is fed back into the pCTR model and the Interest Graph, refining the feed for the next session. The Invisible Gatekeeper: Why Publisher Blocks Matter One of the most sobering revelations from Yesilyurt’s research is the power of the publisher-level block. In the world of SEO, we are used to “penalties” or “ranking demotions” that result in lower positions. Google Discover is different. When a user interacts with a card in their feed, they have the option to tell Google they are not interested in the topic or the publisher. If they choose to block the publisher, it creates a permanent, domain-wide suppression for that specific user. Critically, there is no inverse “boost” mechanism. While a user can “Follow” a publisher, the research suggests that a “Follow” does not guarantee visibility in the same way a “Block” guarantees invisibility. For publishers, this means that clickbait or misleading headlines are a dangerous game. While they might drive a short-term spike in clicks, they increase the probability of users blocking the domain, which permanently shrinks the potential audience size in Discover. Technical Requirements: Images and Meta Tags If you want your content to occupy the most valuable real estate in the Discover feed—the large, high-engagement image cards—you must meet specific technical thresholds. The 1200px Rule The research confirms that image size is a primary filter for card layout. To qualify for a large image display, the “og:image” or the image specified in the Article schema must be at least 1,200 pixels wide. If the image is smaller than this, Google will often default to a small thumbnail or may choose not to show the article at all. High-resolution images are not just a cosmetic preference; they are a technical requirement for eligibility in the highest-performing segments of the feed. The Role of Open Graph Tags Google Discover leans heavily on the Open Graph protocol. The research identified six key tags that the system prioritizes: 1. og:title 2. og:image 3. og:description 4. og:url 5. og:site_name 6. og:type While Google is capable of finding fallbacks, relying on the system to “guess” your title or image is a risk. Publishers should

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Mechanism Google Discover has transformed from a supplementary feature into one of the most significant drivers of organic traffic for publishers, often surpassing traditional search results in sheer volume. However, for many digital marketers and SEO professionals, it remains a “black box.” Unlike traditional search, where queries provide a clear roadmap for intent, Discover is a proactive, query-less feed that pushes content to users based on their interests and past behavior. Recent SDK-level research by Metehan Yesilyurt has shed light on the inner workings of this system. By analyzing the observable signals within Google’s Discover app framework, the research reveals a structured, multi-stage pipeline that governs how content is qualified, ranked, and occasionally filtered out. This deep dive explores the technical architecture behind the feed, providing a clear view of where content succeeds and where it breaks before it even reaches a user’s screen. The Nine-Stage Google Discover Pipeline The research maps out a sophisticated nine-stage flow that every piece of content must navigate to appear in a user’s feed. Understanding these stages is crucial because failure at an early stage—such as the block check—prevents the content from ever reaching the ranking or delivery phases. 1. Crawling and Content Extraction The process begins with Google’s ability to crawl and understand the page. This isn’t significantly different from traditional Google Search crawling, but the speed at which it happens is vital for Discover. The system must quickly parse the content to identify its core subject matter, entities involved, and the overall quality of the page. 2. Meta Tag Analysis Once crawled, Google Discover prioritizes specific meta tags. Unlike standard search, which focuses heavily on header tags and keyword density, Discover relies on visual and social meta tags. The system specifically scans for Open Graph (OG) tags to understand what the user will see in their feed. If these tags are missing or poorly configured, the content may be disqualified or appear as a low-quality text link. 3. Content Classification Google then classifies the content type. Is it a breaking news story? Is it a “how-to” guide? Is it a deep-dive evergreen piece? This classification determines which “bucket” the content falls into, which later dictates how the freshness decay model will be applied to it. 4. The Publisher Block Check This is one of the most critical stages revealed by the research. Before any interest matching occurs, Google checks for publisher-level blocks. If a user has previously selected “Don’t show content from this site,” that domain is filtered out immediately. This happens server-side and acts as a hard gatekeeper. 5. Interest Matching Discover is fundamentally a personalization engine. At this stage, Google matches the classified content with the user’s individual interest profile. This profile is built from search history, app usage, location history, and direct interactions with previous Discover cards. 6. Predicted Click-Through Rate (pCTR) Modeling Once a match is found, the system applies a predictive model. Google calculates a “pCTR”—a prediction of how likely the specific user is to click on that specific card. This model takes into account the historical performance of the URL, the domain’s reputation, and the visual appeal of the metadata. 7. Feed Layout Construction Google doesn’t just list cards; it builds a visual experience. The layout engine decides whether to show a large, high-resolution image card or a smaller thumbnail card based on the available assets and the user’s device specifications. 8. Content Delivery The content is pushed to the user’s app. This delivery is dynamic and can happen in real-time as the user scrolls, often refreshing or reordering the feed without the user needing to manually swipe down to refresh. 9. User Feedback Loop The final stage is the recording of the user’s reaction. Did they click? Did they ignore it? Did they dismiss the card or report it? This feedback is fed back into the pCTR model and the interest matching engine to refine future delivery. The Power of Publisher-Level Filters One of the standout findings of the research is the hierarchy of filtering. The publisher-level block is a definitive, sitewide action. When a user tells Google they no longer want to see content from a specific domain, the suppression is absolute for that user. There is no comparable mechanism for a sitewide “boost.” While a user can “follow” a topic or a brand, the negative signal of a block carries significantly more weight in the pipeline than a positive signal. This highlights the danger of “clickbait” or low-quality content. While it might drive short-term clicks, if it leads to users blocking the publisher, the long-term impact is a permanent loss of visibility within that user’s feed. User dismissals are stored permanently for specific URLs, ensuring that once a user rejects a story, it never reappears. Ranking Factors and the pCTR Model While the exact weights of Google’s internal ranking signals remain proprietary, the research identifies the specific data points sent to the server to inform the ranking decisions. These include: Title Optimization (og:title) The system looks specifically at the og:title tag. If this is missing, it falls back to the Twitter title tag or the standard HTML title. The research suggests that the title used for Discover should be compelling and descriptive, but publishers must avoid being overly sensational to prevent user blocks. Image Quality and Size The research confirms a long-standing SEO suspicion: image size is a direct ranking factor for visibility. To qualify for the large, high-engagement cards, images must be at least 1200px wide. Smaller images are relegated to thumbnail status. Thumbnails historically receive significantly lower click-through rates, which in turn lowers the pCTR score, eventually leading to the content being phased out of the feed entirely. Technical Health and Image Loading The pipeline includes a check for whether images load successfully. If a page has technical issues, broken image links, or slow-loading visual assets, Google may filter the content to ensure a smooth user experience. This makes technical SEO and CDN performance vital for

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

The Mechanics Behind Google Discover: A Deep Dive into Research For many digital publishers and SEO professionals, Google Discover remains one of the most volatile and unpredictable sources of organic traffic. Unlike traditional Search, which relies on active queries, Discover is a proactive “queryless” feed that anticipates user interests. Because of its black-box nature, understanding why a story goes viral or why a site suddenly loses all visibility has largely been a matter of guesswork—until now. Recent SDK-level research conducted by Metehan Yesilyurt has shed light on the internal architecture of Google Discover. By analyzing the observable signals within the Google app’s framework, the research reveals a complex, nine-stage pipeline that governs how content is crawled, qualified, ranked, and ultimately filtered. This research provides a roadmap for publishers looking to stabilize their Discover performance and understand the technical triggers that lead to success or exclusion. The Nine-Stage Google Discover Pipeline The journey from a published article to a user’s Discover feed is not a single leap. It is a structured process involving multiple checks and balances. According to the research, Google operates a sophisticated pipeline that evaluates content long before it ever reaches the ranking stage. 1. Crawling and Content Extraction The process begins with Google’s standard crawling infrastructure. However, for Discover, the extraction focuses heavily on the “entity” of the article. Google isn’t just looking for keywords; it is trying to understand the core topic and how it relates to established user interests. This stage is where Google determines if the page is a standard article, a video, or another media type. 2. Meta Tag Analysis Once crawled, the system specifically looks for Open Graph (OG) tags. These tags provide the “preview” information for the Discover card. The research highlights that Google prioritizes the og:title and og:image. If these are missing or poorly formatted, the content may fail to progress further in the pipeline. 3. Content Classification Google classifies content into categories such as “Breaking News,” “Evergreen,” or “Niche Interest.” This classification dictates how the “freshness decay” model will be applied later. Breaking news is given an immediate, intense boost, while evergreen content is evaluated for its long-term relevance to specific user cohorts. 4. The Publisher Block Check This is a critical “gatekeeper” stage. Before the system even considers if a user might like your content, it checks for publisher-level blocks. If a user has previously opted to “Don’t show content from this site,” the content is discarded immediately. This check happens server-side and is a binary filter that overrides all other ranking signals. 5. Interest Matching In this stage, Google maps the content’s entities against the user’s Knowledge Graph profile. Google tracks a user’s search history, YouTube views, and previous Discover interactions to build a profile of interests. If the article’s topic doesn’t align with these interests, it is filtered out. 6. The pCTR (Predicted Click-Through Rate) Model One of the most significant findings in Yesilyurt’s research is the existence of a server-side pCTR model. Google estimates the likelihood of a click before the content is even served. This model uses historical performance data for the URL, the domain’s reputation in Discover, and the visual appeal of the title and image to predict engagement. 7. Feed Layout Construction Google Discover isn’t just a list; it’s a visual layout. In this stage, the system decides whether to show a large-image card, a small-thumbnail card, or a video carousel. The research notes that the layout is often determined by the quality and dimensions of the provided assets. 8. Content Delivery The content is finally pushed to the user’s device. This happens dynamically, and the feed can be updated in real-time. The delivery stage is also where A/B testing and experimentation often take place, with different users seeing different variations of the same content. 9. User Feedback Loop Once the content is delivered, the system enters a continuous loop of recording feedback. Did the user click? Did they dismiss it? Did they spend time reading the page? This data is fed back into the pCTR model for future content from that same publisher. Hard Publisher Blocks: The Silent Traffic Killer Perhaps the most sobering discovery in the research is the power of the publisher-level block. In the Google Discover interface, users have the option to hide individual stories or block an entire domain. According to the SDK analysis, these blocks are permanent and happen at the very beginning of the pipeline. If a significant number of users block your site, your content is essentially “dead on arrival” for those segments. What makes this particularly challenging is that there is no equivalent “sitewide boost” mechanism. While a user can “Follow” a site, the negative signal of a block carries significantly more weight in the filtering process than a positive signal carries in the ranking process. This highlights the importance of maintaining high editorial standards; clickbait that leads to user frustration can result in long-term domain suppression through these hard blocks. The Critical Role of Metadata and Image Quality Technical SEO for Discover is often simplified to “having a good image,” but the research provides specific parameters that publishers must meet to remain competitive. Google Discover reads six key page-level tags, with the og:title and og:image being the most influential. If these are absent, Google will attempt to “fall back” to other tags, such as Twitter Cards or the standard HTML title tag, but these fallbacks are less reliable and may lead to poor card rendering. Image dimensions are a major factor in qualification. To be eligible for the large, high-engagement cards that drive the majority of Discover traffic, images must be at least 1200px wide. Smaller images are relegated to thumbnail status. In a feed that is almost entirely visual, a thumbnail card has a drastically lower pCTR, which in turn tells Google’s model that the content isn’t worth showing to more people. Furthermore, two specific meta tags—nopagereadaloud and notranslate—can inadvertently cause a page to be excluded from Discover entirely. While

Uncategorized

Google AI Mode Link Update, Click Share Data & ChatGPT Fan-Outs – SEO Pulse via @sejournal, @MattGSouthern

The Evolution of Search: Understanding the Shift to AI-Driven Interfaces The digital marketing landscape is currently undergoing its most significant transformation since the invention of the search engine itself. As Google integrates generative AI into the core of its search experience and OpenAI expands its footprint with sophisticated search capabilities, the traditional rules of SEO are being rewritten. The latest industry updates highlight a critical shift in how AI interfaces manage link visibility, the way we calculate click share data, and how “fan-out” mechanisms in tools like ChatGPT are changing information retrieval. For years, SEO was defined by the “ten blue links.” Success was measured by ranking in the top three positions to capture the lion’s share of traffic. However, with the introduction of Google’s AI Overviews (formerly SGE) and the rise of conversational search agents, the path from a user’s query to a publisher’s website is no longer a straight line. We are moving into an era of “source filtering,” where AI models act as sophisticated gatekeepers, deciding which information is synthesized and which links are worthy of being cited. The Google AI Mode Link Update: A New Era for Visibility One of the most pressing updates for publishers is the way Google is refining link visibility within its AI-powered search results. Early iterations of Google’s AI Overviews were criticized for burying citations or failing to provide clear paths to the original content creators. The recent “AI Mode” link update aims to address this by integrating links more naturally into the generative response. This update isn’t just about aesthetic changes; it represents a fundamental shift in how Google balances the needs of the user with the health of the web ecosystem. By placing links directly within the flow of the AI-generated text—or in prominent sidebars and dropdowns—Google is attempting to maintain its role as a traffic driver while providing the instant answers users now expect. For SEO professionals, this means that “ranking” now involves appearing as a cited source within an AI summary, which requires a different approach than traditional keyword optimization. How AI Link Integration Works The AI mode uses a process of retrieval-augmented generation (RAG). When a user asks a complex question, the AI doesn’t just pull from its internal training data; it searches the live web for the most relevant and authoritative sources. It then synthesizes that information into a coherent answer. The link update ensures that the specific documents used to generate that answer are visible and clickable. From a technical standpoint, this means that being the “best” answer isn’t just about having the right keywords. It’s about having a clear, modular content structure that an AI can easily parse and credit. Content that is buried in long, unstructured paragraphs or behind heavy JavaScript may struggle to be picked up by these AI “reading” processes, leading to a loss in visibility even if the site ranks well in traditional search. Redefining Click Share Data in an AI-First World As AI summaries take up more real estate at the top of the Search Engine Results Page (SERP), traditional metrics like Click-Through Rate (CTR) are becoming harder to interpret. This has led to an increased focus on “Click Share Data.” Click share is a metric that estimates the percentage of all achievable clicks that your website received. In an AI-dominated environment, click share becomes the primary barometer for success. When an AI provides a comprehensive answer directly on the SERP, it often results in a “zero-click search.” While this might seem like a nightmare for publishers, the data suggests a more nuanced reality. Users who do click through from an AI overview tend to be further down the funnel and more deeply interested in the topic. They aren’t looking for a quick fact—they’ve already gotten that from the AI—they are looking for the depth, authority, and nuance that only a full article can provide. Analyzing the Impact on CTR Recent studies into click share data show that while total click volume for informational queries may dip, the value of each click is increasing. To adapt, SEOs must move away from chasing high-volume, low-intent keywords that are easily summarized by AI. Instead, the strategy should pivot toward content that requires human expertise, unique data, or personal experience—elements that AI cannot replicate and that encourage a user to click “read more.” Furthermore, tracking click share now requires a more sophisticated tech stack. Traditional tools like Google Search Console are evolving to provide more data on how AI Overviews affect performance, but savvy marketers are also looking at brand mentions and “share of voice” within AI responses as new Key Performance Indicators (KPIs). ChatGPT Fan-Outs and the Logic of AI Research While Google dominates the search market, OpenAI’s ChatGPT is pioneering new ways for users to discover information through what researchers call “fan-outs.” A fan-out occurs when an AI agent, tasked with answering a complex query, breaks that query down into multiple sub-tasks and “fans out” across the internet to gather data from various sources simultaneously. This is a departure from traditional search indexing. In a standard search, a crawler visits your site, indexes it, and then retrieves it later when a query matches. In a fan-out scenario, the AI is performing real-time research. It might look at a product review on one site, a pricing table on another, and a user discussion on a third, all within seconds, to provide a consolidated recommendation. The Technical Implications of Fan-Outs For webmasters, ChatGPT fan-outs mean that server load patterns may change. Instead of predictable crawls from the Googlebot, sites may see bursts of activity from AI agents performing real-time synthesis. This makes site speed and accessibility more important than ever. If an AI agent cannot quickly fetch the data it needs during a fan-out process, your site will simply be skipped in favor of a faster competitor. Moreover, these fan-outs prioritize high-authority, “truth-dense” content. The AI is looking for facts it can cross-reference. If your site provides data that contradicts

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

Understanding the Impact of Generative AI on Digital Ecosystems The digital marketing landscape is currently undergoing its most significant transformation since the advent of the mobile internet. As Large Language Models (LLMs) like ChatGPT, Claude, Gemini, and Perplexity become integrated into the daily workflows of millions, the way users discover information—and brands—is fundamentally shifting. For over a year, digital strategists and SEO professionals have speculated about the “death of the click” and the potential for AI to cannibalize traditional search traffic. To move beyond speculation and into the realm of data-driven strategy, we have analyzed an extensive dataset spanning 13 months, from January 1, 2025, to February 7, 2026. This period represents a critical era in AI maturity, moving from the initial hype of generative tools to their practical application in commerce, research, and lead generation. By examining LLM prompt referral traffic within Google Analytics across a diverse customer base, we can finally quantify the influence these models have on brand visibility and business outcomes. The findings offer a nuanced picture. While the volume of traffic originating from LLMs remains a fraction of traditional search, the quality of that traffic and its rate of growth suggest that we are witnessing the birth of a powerhouse marketing channel. In this report, we break down four major findings that define the current state of LLM referral traffic and what they mean for your brand’s future. The Current State: LLM Referral Traffic is Still a Small Fraction One of the most persistent fears in the SEO industry is that AI search engines will immediately replace traditional search engines, leading to a total collapse of organic traffic. However, the data from the last 13 months suggests a much more gradual transition. According to our dataset, LLM referral traffic currently accounts for less than 2% of total referral traffic on average. To put this in perspective, for every 100 visitors arriving at a brand’s website via a referral source, fewer than two are coming directly from an LLM citation. Across the various businesses studied, the range of LLM-driven traffic fluctuated between 0.15% and 1.5%. This includes traffic from major players such as OpenAI’s ChatGPT, Perplexity AI, Google’s Gemini, and Anthropic’s Claude. These figures indicate that while AI tools are ubiquitous in conversation, they have not yet become the primary gateway for web navigation for the general population. For most businesses, this means that LLM optimization should not yet supersede traditional SEO or paid search in terms of immediate budget allocation. However, viewing this 2% figure in isolation would be a mistake. This small slice of the pie represents the “innovator” and “early adopter” phases of the technology’s lifecycle. Much like the early days of social media referral traffic, the current volume is less about the “now” and more about the “next.” The Velocity of Growth: A Rapid Upward Trajectory While the current volume of LLM traffic is modest, the rate at which it is expanding is nothing short of explosive. When comparing the first half of 2025 to the second half, our data shows an average growth rate of 80% in LLM referral traffic. This is not a linear increase; it is an acceleration. The data reveals significant variance across different industries. Some companies experienced a steady 10% growth, likely in sectors where information changes slowly or where users still prefer traditional visual search. Conversely, other brands—particularly in tech, B2B services, and niche consumer research—saw traffic increases of up to 300%. Between January and December of 2025, aggregate referral traffic from LLMs grew 3x across the board. This growth velocity is driven by two primary factors: consumer adoption and algorithmic evolution. Users are becoming more adept at using LLMs for complex queries that involve “shopping around” or “researching the best options.” Simultaneously, LLM developers are refining their citation engines, making it easier for users to click through to the original source. For marketers, the takeaway is clear: do not focus solely on the current volume. Monitor the velocity of growth within your specific niche. If your LLM traffic is doubling every six months, it will become a dominant channel far sooner than your competitors might realize. The S-Curve of Adoption We are currently on the steep incline of the technology adoption S-curve. In the early months of 2025, LLM traffic was a statistical anomaly. By the start of 2026, it has become a measurable line item in Google Analytics. This trend suggests that by 2027, LLM referrals could realistically challenge traditional social media platforms as a primary source of high-intent traffic. Shifting Sands: How LLMs Choose Their Sources Perhaps the most actionable insight for content creators is the shifting nature of LLM citations. An LLM is only as good as its training data and its ability to access real-time information. Over the past several months, we have observed a distinct shift in the types of sources LLMs prioritize when answering user prompts. By monitoring over 5,000 prompts and responses across Gemini, ChatGPT, and Perplexity, we can see exactly where these models are looking. Historically, LLMs relied heavily on high-authority news sites and encyclopedic entries. However, the data from late 2025 and early 2026 shows a surge in citations from “community-led” and “visual” platforms. Specifically, YouTube links and citations have seen a significant increase. Users are looking for demonstrations, reviews, and tutorials, and LLMs are responding by serving up video content as a primary source of truth. Reddit also saw a massive spike in citations throughout much of 2025, acting as a proxy for “real human experience.” While this growth leveled off toward the end of the year, it remains a pillar of the AI citation ecosystem. These shifts suggest that your SEO strategy can no longer live in a vacuum on your website. To be cited by an LLM, your brand needs a presence where the LLMs are looking: in forums, on video platforms, and in trusted community hubs. The Role of Third-Party Monitoring A major challenge for brands is that LLMs do not

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Google Discover has long been one of the most mysterious and volatile sources of traffic for digital publishers. Unlike traditional Google Search, which relies on a user entering a specific query, Discover is a proactive, “query-less” feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many news organizations and tech blogs, a single hit in Discover can result in hundreds of thousands of visits in a matter of hours, yet the mechanisms behind how content is selected have largely remained a black box. Recent SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on this system. By analyzing the observable signals within Google’s Discover app framework and telemetry data, Yesilyurt mapped out the intricate pipeline that content must navigate before it ever reaches a user’s screen. This research reveals a structured, nine-stage flow governed by strict technical requirements, predictive modeling, and aggressive filtering. Understanding this pipeline is no longer optional for SEOs; it is the blueprint for survival in a push-based content economy. The Nine-Stage Google Discover Pipeline The journey from a published article to a Discover card is not a simple linear path. Instead, it is a high-speed filtering process designed to eliminate low-quality or irrelevant content as early as possible. According to the research, the process can be broken down into nine distinct phases. First, Google must crawl and understand the content. This is the foundation of all Google products, but in Discover, the emphasis is heavily placed on semantic understanding and classification. Once crawled, the system moves to metadata extraction, where it specifically looks for key tags like the image and title. Following this, the content is classified into categories, such as “breaking news” or “evergreen,” which dictates how the system handles its “freshness” decay later on. The fourth stage is perhaps the most critical for publishers: the block check. Before any ranking or interest matching occurs, the system checks if the user or the platform has blocked the publisher. If a user has previously selected “Don’t show content from this site,” the content is discarded immediately. If the content survives the block check, it moves to interest matching, where Google’s vectors map the article’s topics to the user’s documented interests. The final stages involve server-side predictive modeling (pCTR), feed layout construction, content delivery, and the recording of user feedback. This feedback loop is continuous; if a user engages with the content, it reinforces the publisher’s standing. If they dismiss it, the system learns to suppress similar content in the future. The Power of the Publisher Block One of the most striking findings of the research is the hierarchy of filtering. Many SEOs believe that ranking factors like page speed or keyword density are the primary drivers of visibility. However, the research shows that publisher-level blocks happen long before the ranking engine even looks at your content. The “Don’t show content from this site” action is an incredibly powerful tool in the user’s hands. When a user blocks a domain, that content is suppressed across the board for that individual. There is currently no equivalent “sitewide boost” mechanism that rewards a publisher as aggressively as a block punishes them. This creates a high-stakes environment where a single piece of misleading or “clickbaity” content can lead to a permanent loss of a potential reader’s entire lifetime value on the platform. This “hard block” logic underscores the importance of brand trust. In Discover, you aren’t just competing for a click; you are competing to remain in the user’s ecosystem. If your content consistently fails to deliver on the promise of its headline, users will eventually exercise their right to block your domain, effectively “de-indexing” you from their personal feed. The Ranking Model: pCTR and Server-Side Logic Once an article passes the initial eligibility and block filters, it enters the ranking phase. The research highlights the use of a predicted click-through rate (pCTR) model. This model resides on Google’s servers and estimates the likelihood of a user clicking on a specific card based on several variables. While the internal weights of the pCTR model are not visible to the public, the SDK telemetry shows which signals the app sends to Google’s servers to inform these decisions. These include: The Page Title: Extracted primarily from the Open Graph title tag (og:title). Image Quality and Dimensions: The system checks if the image is large enough and if it has loaded successfully in the past. Content Recency: A “freshness” score is applied based on the publication timestamp. Historical Engagement: Previous click and impression data for that specific URL. Technical Reliability: Signals indicating whether images or snippets are failing to render properly. The pCTR model is dynamic. If an article begins to perform well (i.e., its actual CTR exceeds its predicted CTR), it can “trend,” causing the system to push it to a much wider audience of users with similar interest profiles. Conversely, if a story has a high impression count but very few clicks, the pCTR model will quickly deprioritize it, leading to the “traffic cliff” many publishers experience after a successful run. The Critical Role of Image Requirements In Google Discover, visuals are not just an aesthetic choice; they are a technical requirement. The research confirms that Google Discover reads specific page-level tags, with a heavy reliance on Open Graph metadata. If a page lacks a high-quality image, it is often disqualified from appearing as a prominent card, or it may not appear at all. To qualify for the high-engagement “large card” format, images must be at least 1,200 pixels wide. Content with smaller images is typically relegated to small thumbnail layouts, which have significantly lower click-through rates. Furthermore, the research indicates that Google monitors whether images load successfully. If your site has technical issues like slow-loading images or broken 404 links for metadata images, the system may filter your content out entirely to preserve the user experience of the feed. Publishers should also be aware of fallback mechanisms. If the og:title

Scroll to Top