Author name: aftabkhannewemail@gmail.com

Uncategorized

OpenAI COO says ChatGPT ad rollout will be “iterative”

The Strategic Pivot: OpenAI’s Move Toward an Ad-Supported Ecosystem For several years, ChatGPT has been the gold standard for conversational artificial intelligence, primarily operating on a subscription-based revenue model. However, the landscape of generative AI is shifting rapidly. OpenAI has officially begun the rollout of advertisements within ChatGPT, specifically targeting free-tier and Go-tier users in the United States. This move represents one of the most significant changes to the platform’s business strategy since its inception, signaling a transition from a pure software-as-a-service (SaaS) model to a hybrid monetization approach that mirrors traditional tech giants like Google and Meta. During the recent India AI Summit, OpenAI Chief Operating Officer Brad Lightcap addressed the industry’s curiosity regarding this shift. He described the rollout as “iterative,” a choice of words that suggests OpenAI is moving cautiously to avoid alienating its massive user base. The goal, according to Lightcap, is to build an advertising platform that feels “additive” rather than intrusive, ensuring that user trust and privacy remain at the forefront of the experience. What “Iterative” Means for the Future of ChatGPT The term “iterative” is common in the tech world, but in the context of ChatGPT ads, it carries specific weight. It implies that the current ad formats being seen by U.S. users are far from the final product. OpenAI is currently in a high-stakes testing phase, gathering data on user interaction, sentiment, and the overall impact on the conversational flow. Unlike traditional search engines where ads are clearly delineated in a sidebar or at the top of a results page, conversational AI presents a unique challenge. An ad that breaks the “human-like” flow of a conversation can feel jarring. By taking an iterative approach, Lightcap suggests that the company is experimenting with how to weave brand messaging into the natural language processing (NLP) experience without degrading the utility of the tool. This cautious rollout is also a response to the technical complexities of serving ads in real-time within a large language model (LLM). OpenAI must ensure that the ad-serving infrastructure does not increase latency or compromise the quality of the AI’s responses. As the platform refines its model, we can expect to see more sophisticated, contextually aware placements that go beyond simple banners. The Target Audience: Free and Go-Tier Users Currently, the advertisement rollout is limited to specific segments of the ChatGPT user base. Users on the “Plus,” “Team,” or “Enterprise” tiers remain unaffected, as their subscription fees continue to cover the high operational costs of the service. Instead, OpenAI is focusing on the Free tier and the “Go” tier—a newer, lower-cost bracket designed for casual users. The decision to monetize the free user base is an economic necessity. Running a world-class LLM requires an immense amount of compute power, primarily driven by NVIDIA’s high-end GPUs. As the user base scales into the hundreds of millions, the “burn rate” of maintaining free access becomes unsustainable without a secondary revenue stream. By introducing ads, OpenAI can continue to offer broad access to its technology while offsetting the massive overhead associated with inference costs. Premium Pricing and Market Positioning OpenAI is not entering the digital advertising market as a budget option. Early reports indicate that the company is positioning itself as a premium ad platform, leveraging the high intent and deep engagement levels of its users. The reported cost-per-thousand-impressions (CPM) is as high as $60, which is significantly higher than the industry average for standard display or social media ads. Furthermore, entry into the ChatGPT ad ecosystem is currently gated behind substantial financial commitments. Minimum spends are reportedly starting around $200,000, effectively limiting the initial testing phase to “Big Tech” and global household brands. Early partners include major names such as Target, Adobe, and Shopify. This high barrier to entry allows OpenAI to maintain a level of quality control, ensuring that the first ads users see are from reputable, high-value brands rather than low-quality programmatic “junk” ads that often plague early-stage ad networks. The Shopify Integration and Shop Campaigns One of the most interesting aspects of the rollout is the partnership with Shopify. Through “Shop Campaigns,” Shopify merchants can now reach potential customers directly inside the ChatGPT interface. This is a logical extension of ChatGPT’s utility as a shopping assistant. When a user asks for gift ideas or product recommendations, the AI can now surface specific products from Shopify’s vast merchant network. This integration points toward a future where ChatGPT serves as a powerful discovery engine. For brands, this is the “holy grail” of advertising: being present at the exact moment a consumer is seeking information or making a purchase decision. If executed correctly, these ads aren’t just interruptions—they are the answers to the user’s queries. Competitive Pressures: OpenAI vs. Anthropic The timing of this ad rollout is no coincidence. The AI sector is becoming increasingly crowded, with competitors like Anthropic, Google (Gemini), and Perplexity vying for market share. Anthropic, in particular, has ramped up its marketing efforts, even launching a high-profile Super Bowl campaign to promote its “Claude” AI. OpenAI CEO Sam Altman has been vocal about this rivalry. Altman recently defended OpenAI’s commitment to providing free access to AI, noting that OpenAI faces a “differently-shaped problem” compared to its smaller rivals. With a significantly larger user base, OpenAI cannot rely solely on venture capital or high-priced subscriptions to stay afloat. The scale of ChatGPT requires a robust, scalable monetization strategy that doesn’t restrict the technology to only those who can afford a monthly fee. Altman’s comments suggest that while competitors may focus on niche or high-end enterprise users, OpenAI is committed to being the “operating system” for the general public. Ads are the engine that will allow that vision to persist at scale. Walking the Tightrope: Privacy and User Trust Perhaps the biggest hurdle for OpenAI is the issue of privacy. For years, users have shared sensitive data, creative ideas, and personal queries with ChatGPT. The introduction of an ad model inevitably raises questions about data harvesting and behavioral targeting. Will

Uncategorized

Google to change budget pacing for campaigns using ad scheduling

Google Ads is preparing to implement a major structural shift in how it handles campaign budgets for advertisers who utilize ad scheduling. Starting March 1, 2026, the platform will fundamentally alter its budget pacing logic, a move that could significantly increase monthly expenditures for campaigns that do not run 24/7. This change represents one of the most significant updates to the Google Ads billing and pacing infrastructure in recent years, specifically targeting how “average daily budgets” are interpreted across a billing cycle. For years, digital marketers have used ad scheduling—often referred to as dayparting—as a primary lever to control costs and ensure ads only appear during high-conversion windows. Whether it is a B2B company only showing ads during business hours or a local restaurant targeting potential customers during the dinner rush, ad scheduling has been a cornerstone of budget conservation. Under the upcoming changes, that conservation may become a thing of the past unless advertisers proactively adjust their settings. Understanding the Shift in Budget Pacing Logic To understand why this change is so impactful, one must first understand the current mechanics of the Google Ads “Monthly Spending Limit.” Google calculates a monthly cap by taking your average daily budget and multiplying it by 30.4 (the average number of days in a month). Traditionally, if a campaign was scheduled to run only on certain days of the week, Google would pace the budget based on those active days. If your schedule only allowed the campaign to run for 10 days out of the month, the total spend would generally stay close to the sum of those 10 daily budgets. Effective March 1, 2026, Google will change this behavior. The system will now proactively attempt to spend up to the full monthly limit—the 30.4x multiplier—regardless of how many days the campaign is actually scheduled to run. This means the algorithm will “push harder” to exhaust the monthly budget within the limited windows you have provided. The Mechanics of the 2x Daily Rule While the pacing logic is changing, Google’s existing “overdelivery” rule remains in place. This rule allows Google to spend up to two times your average daily budget on any given day if the algorithm identifies high-quality traffic. However, under the old system, campaigns with limited schedules rarely hit their theoretical monthly cap because they weren’t active long enough for the overdelivery to add up to the 30.4x limit. Under the new system, Google will leverage that 2x daily flexibility much more aggressively. If a campaign is only active for a few days a month, the system will aim to spend as close to 200% of the daily budget as possible on those active days until the full monthly cap (calculated on a 30.4-day basis) is reached. The Impact: A Closer Look at the Numbers The implications of this change are best illustrated through a practical example. Consider a local service provider that only runs ads on Saturdays and Sundays. They have set an average daily budget of $100. In the current environment, this campaign would run roughly eight days per month. The advertiser would expect to spend approximately $800 per month. While Google might occasionally spend $110 one day or $90 another, the pacing is anchored to the active days. Under the new logic arriving in 2026, the monthly spending limit for this campaign is calculated as $100 x 30.4, which equals $3,040. Google will now try to reach that $3,040 ceiling within the eight days the ads are allowed to run. Because of the 2x daily overspend cap, the maximum Google can spend in those eight days is $1,600 ($200 per day for 8 days). Instead of the traditional $800 spend, the advertiser will now see their bill double to $1,600 for the exact same schedule, simply because the algorithm is pacing more aggressively to hit the monthly limit. Why Is Google Making This Change? According to Google Ads Liaison Ginny Marvin, the primary goal of this update is to align pacing behavior with advertiser expectations regarding monthly spending limits. The official stance is that when an advertiser sets a daily budget, they are implicitly agreeing to a monthly cap of 30.4 times that amount. Google’s logic suggests that the algorithm should have the freedom to find the best opportunities to spend that total amount, even if the window of opportunity is restricted by a schedule. From a technical perspective, this change also gives Google’s Smart Bidding algorithms more “breathing room.” By allowing the system to spend more on limited days, the AI can enter more auctions and bid more competitively for high-value conversions that might have been missed under stricter pacing. However, for many advertisers, this “breathing room” looks more like an unexpected spike in media spend. The Role of Smart Bidding and AI It is important to note that spend will still be dictated by campaign objectives. If you are using Target CPA (tCPA) or Target ROAS (tROAS), the system is still theoretically bound by your performance goals. If the algorithm cannot find conversions at your target price, it shouldn’t—in theory—spend the extra budget just for the sake of spending it. However, in practice, a higher pacing ceiling often leads to the system testing more aggressively, which can lead to higher costs during the learning phase. Who Will Be Affected? This is not a universal rollout that will hit every account simultaneously. Google has clarified that the update will be rolled out gradually. For now, only advertisers who have received a specific notification via email or through their Google Ads dashboard will be affected by the March 2026 deadline. However, history suggests that once these features are tested with a subset of users, they eventually become the standard across the entire platform. Marketers should assume that this pacing logic will eventually apply to all campaigns using ad scheduling, even if they haven’t received a notice yet. Strategic Adjustments for Advertisers With the March 1, 2026, deadline on the horizon, advertisers need to audit their accounts and

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

What 13 months of data reveals about LLM traffic, growth, and conversions The digital marketing landscape is currently undergoing its most significant shift since the advent of the smartphone. Large Language Models (LLMs) like ChatGPT, Claude, and Gemini are no longer just experimental novelties; they are becoming primary interfaces for information retrieval. As these tools integrate more deeply into our daily workflows, brand owners and SEO professionals are asking the same critical questions: How much traffic are these AI models actually sending? Is that traffic valuable? And how should our strategy change to keep up? To provide concrete answers, we conducted a comprehensive analysis of LLM prompt referral traffic across a diverse customer base. This study spans 13 months, beginning January 1, 2025, and concluding February 7, 2026. By examining Google Analytics data and monitoring over 5,000 individual prompts and responses across major LLM APIs, we have identified the trends that will define the next era of digital discovery. The data reveals a complex picture. While LLMs are not yet replacing traditional search engines in terms of pure volume, their growth trajectory and the quality of the traffic they generate suggest a paradigm shift in how users move from “question” to “conversion.” The Current Scale: LLM Referral Traffic is Still Small One of the most grounding findings of this 13-month study is that, despite the overwhelming amount of industry discussion, LLM referral traffic remains a small fraction of the total web ecosystem. On average, LLM referral traffic accounts for less than 2% of total referral traffic. To put this into perspective, for every 100 visitors who arrive at a website via a referral link, fewer than two are coming directly from an AI interface. The data shows a specific range of 0.15% to 1.5% of total referral traffic across various models, including ChatGPT, Perplexity, Gemini, and Claude. This suggests that while users are spending a significant amount of time interacting with AI, they are not always clicking through to the source material. This phenomenon, often referred to as “zero-click” behavior in the context of traditional search, is even more pronounced in the LLM space, where the model’s goal is often to provide a comprehensive answer within the chat interface itself. For many businesses, this means that while LLM optimization is a vital forward-looking strategy, it shouldn’t necessarily cannibalize the resources currently dedicated to high-volume channels like traditional SEO or paid search. However, focusing solely on the volume ignores the explosive growth and the high intent of the users who do choose to click. Rapid Expansion: LLM Traffic Growth Velocity While the current volume is low, the growth rate is staggering. When comparing the first half of 2025 to the second half, our data showed an average growth rate of 80% in LLM referral traffic. This is not a linear increase; it is an acceleration. Some companies in our dataset experienced growth as high as 300% over the 13-month period, while others saw more modest gains of 10%. The aggregate monthly referral traffic throughout 2025 shows a steady climb. By December 2025, referral traffic from LLMs had tripled compared to January of the same year. This indicates that as consumers become more comfortable using AI for complex queries, they are increasingly looking to verify information or complete transactions by visiting the cited sources. This growth is driven by two primary factors. First, consumer adoption is expanding as AI becomes integrated into browsers, operating systems, and mobile devices. Second, the algorithms governing how LLMs cite their sources are constantly evolving. As these models become better at identifying authoritative, high-quality content, they are more likely to provide links that users find worth clicking. For marketers, the takeaway is clear: do not just monitor the volume of your traffic; monitor the velocity. A channel that is small today but growing at 80% every six months will become a dominant force much sooner than many realize. A Shifting Landscape: Where Citations Come From The sources that LLMs choose to cite are not static. Our analysis of over 5,000 prompts and responses reveals that the “authority” recognized by AI models is shifting. Over the last several months of the study, we observed significant volatility in which platforms were being referenced in AI responses. Two platforms, in particular, have seen notable shifts: YouTube and Reddit. Over the final 30 days of the study (leading into February 2026), citations for YouTube links increased significantly. This suggests that LLMs are increasingly relying on video transcripts and visual content to answer user queries, particularly for “how-to” or product-related questions. Reddit also saw a massive surge in citations throughout 2025, though that growth recently reached a plateau. These shifts are critical because they dictate where a brand’s content must live to be “discoverable” by an LLM. If an AI model prefers to cite a forum discussion or a video rather than a traditional blog post, your content strategy must adapt to include those formats. Without monitoring these shifts through third-party tools—since LLMs do not provide this granular data directly—brands are essentially flying blind. The Conversion Powerhouse: Why LLM Traffic Outperforms Perhaps the most vital finding in our 13-month data set is the conversion rate of LLM-referred traffic. While LLMs drive the lowest percentage of total traffic (roughly 25 times less than traditional SEO or direct traffic), they drive the highest-quality visitors. Across our customer base, LLM referrals boasted an approximate 18% conversion rate. This is significantly higher than any other digital marketing channel, including paid shopping, organic search, and PPC. In many cases, these users are converting at double or triple the rate of traditional search visitors. Why is this conversion rate so high? It comes down to intent and pre-qualification. By the time a user clicks a link within an LLM response, they have already gone through a rigorous filtering process. The AI has already answered their preliminary questions, validated their needs, and presented the cited website as the definitive solution. When the user finally arrives at the site,

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Google Discover has long been a mysterious driver of massive traffic for publishers, news organizations, and tech blogs. Unlike traditional Google Search, which relies on a user entering a specific query, Discover is a proactive, personalized feed that delivers content based on what it predicts a user will want to see. This predictive nature makes it incredibly lucrative but also notoriously volatile. Recent SDK-level research by Metehan Yesilyurt has pulled back the curtain on how this system operates. By analyzing the internal signals and telemetry within the Google Discover app framework, Yesilyurt has mapped out a sophisticated, multi-stage pipeline that dictates which articles make it to the feed and which are filtered out before they even have a chance to rank. Understanding this architecture is essential for any digital publisher looking to stabilize their traffic in an era of ever-shifting algorithms. The Nine-Stage Lifecycle of Google Discover Content According to the research, content does not simply “appear” in a user’s feed. It undergoes a rigorous nine-stage process on the server side before a single pixel is rendered on a smartphone screen. This pipeline ensures that content is relevant, high-quality, and compliant with Google’s strict safety and interest-matching standards. The stages of the Discover pipeline are as follows: Crawling and Semantic Understanding: Google’s bots crawl the page to index its content. Beyond simple keyword matching, the system attempts to understand the entities, topics, and overall sentiment of the piece. Meta Tag Extraction: The system specifically looks for structured data and meta tags, primarily focusing on Open Graph (og:) tags for titles and images. Content Classification: The article is categorized. Is it a breaking news story? Is it a “how-to” evergreen guide? This classification determines which “freshness” rules will apply later. Publisher Block Screening: Before matching content to a user, the system checks if the user has previously blocked the publisher. If a block exists, the content is discarded immediately. User Interest Matching: The system compares the article’s topics against the user’s documented interests, search history, and app usage. Predicted Click-Through Rate (pCTR) Modeling: An AI model on Google’s servers estimates the likelihood of the user clicking the story based on its title, image, and the user’s past behavior. Feed Layout Assembly: The system decides where the card will sit in the feed and whether it will be a large feature card or a smaller thumbnail. Content Delivery: The content is pushed to the user’s device. Feedback Recording: The system monitors whether the user clicks, dismisses, or ignores the content, using this data to refine future ranking decisions. The Power of the Publisher Block: A Pre-Ranking Hurdle One of the most significant findings in the research is the placement of the publisher block in the pipeline. Many SEOs believe that blocks are just one of many signals used in ranking. However, the data reveals that publisher-level blocks happen before interest matching and ranking even begin. When a user selects “Don’t show content from [Site Name],” that site is effectively dead to that specific user across the entire Discover ecosystem. This is a binary filter, not a weighted signal. There is no equivalent “sitewide boost” mechanism that a user can trigger to ensure they always see a specific site. While a user can “Follow” a topic or a publisher, the “Block” function is a far more powerful technical tool used by the system to prune the candidate pool of content before the heavy lifting of AI ranking begins. For publishers, this means that even a few days of “clickbait-y” or low-quality content can lead to a wave of user blocks that permanently shrink their potential audience size on Discover. Once a user blocks a domain, regaining that real estate is nearly impossible. Technical Prerequisites: Meta Tags and Image Quality Google Discover is a highly visual medium. The research highlights that the system relies heavily on specific page-level metadata to build its cards. If these elements are missing or poorly implemented, the content may be disqualified from the feed entirely. The 1200px Image Rule To qualify for large, high-engagement cards, Google requires images to be at least 1200 pixels wide. While smaller images might still appear, they are relegated to small thumbnails next to the headline. Data shows that large cards receive significantly higher click-through rates. If your CMS is serving low-resolution featured images or failing to specify them in the Open Graph tags, you are effectively capping your Discover potential. The “Kill” Tags: Notranslate and Nopagereadaloud The research identified two specific meta tags that act as total blockers for Discover eligibility: “notranslate” and “nopagereadaloud”. If Google detects these tags, it often excludes the page from the Discover pipeline. The logic is that Google wants Discover content to be as accessible and versatile as possible within its ecosystem. If a publisher restricts Google’s ability to translate the page or read it aloud via Assistant, the system views the content as “low utility” for the Discover platform. Backup Metadata Logic Google prioritizes the og:title tag for headlines. However, the research shows a clear fallback hierarchy. If the og:title is missing, the system will look for Twitter card titles, and failing that, the standard HTML <title> tag. While the system is flexible, relying on fallbacks can lead to truncated or poorly formatted headlines that hurt your pCTR. The Ranking Model: Understanding pCTR Once a piece of content passes the initial filters, it enters the ranking phase. The core of this phase is the Predicted Click-Through Rate (pCTR) model. This is a server-side calculation that attempts to guess the future. The model uses several signals to calculate this probability: Historical Performance: How have previous URLs from this domain performed? If your site has a history of high engagement, your new content starts with a “trust” advantage. Image Quality and Loading: The system checks if the image URL is valid and if the image has a history of loading successfully. Broken images are a fast track to being filtered out. Title Sentiment and Relevance: The model analyzes

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Black Box of Google Discover For many digital publishers and SEO professionals, Google Discover remains one of the most mysterious and volatile sources of organic traffic. Unlike traditional search, where intent is driven by specific user queries, Discover is a proactive, “query-less” feed that anticipates what a user wants to see before they even ask for it. While it can drive millions of sessions in a matter of hours, it is also notorious for sudden traffic drops and unpredictable behavior. Recent SDK-level research by Metehan Yesilyurt has provided a rare look behind the curtain, revealing the architectural framework that governs how Google Discover qualifies, ranks, and filters content. By analyzing the observable signals within the Google Discover app framework and telemetry data, we can now map out a sophisticated nine-stage pipeline that determines the lifecycle of every piece of content on the platform. This research highlights that Discover is not just a simplified version of Google Search; it is a complex ecosystem driven by predictive modeling, real-time feedback loops, and strict technical eligibility requirements that can disqualify a publisher before the ranking process even begins. The Nine-Stage Content Pipeline The journey from a published article to a user’s Discover feed involves a structured, multi-stage pipeline. Understanding where your content sits in this flow is essential for diagnosing visibility issues. 1. Crawling and Semantic Understanding Before anything else, Google must discover the URL. Through its standard crawling mechanisms, Google parses the page to understand its core subject matter. This involves more than just reading keywords; the system uses advanced natural language processing to categorize the content into specific interest clusters. 2. Metadata Extraction The system looks for specific signals that define how the content will be presented. The research indicates that Discover relies heavily on Open Graph (OG) tags. It reads the title, the primary image, and the description to build the “card” that users see. 3. Content Classification Once understood, the content is categorized. Is this a piece of breaking news, a trending topic, or a “long-tail” evergreen guide? This classification is critical because it determines which “decay” model will be applied to the content’s visibility over time. 4. The Qualification and Block Check This is a “hard” filter stage. Before the system even considers whether a user might like your content, it checks for publisher-level blocks. If a user has previously selected “Don’t show content from [Site Name],” the content is instantly discarded from that user’s potential pool. 5. Interest Matching Google compares the content’s semantic clusters against the user’s documented interests. These interests are derived from Search history, YouTube activity, and previous interactions within the Discover feed itself. 6. Predicted Click-Through Rate (pCTR) Modeling This is the heart of the ranking engine. Using a server-side model, Google estimates the likelihood of a user clicking on a specific card. This isn’t just based on the user’s past behavior, but on how similar users have interacted with the same content and how the publisher has performed historically. 7. Feed Layout Construction The system decides how to arrange the cards. It considers variety, ensuring the user isn’t overwhelmed by a single topic, and determines whether to use a large image format or a smaller thumbnail layout based on the available assets. 8. Content Delivery The content is pushed to the user’s device. Interestingly, the research found that this feed is dynamic; it can be updated in real-time while a user is actively scrolling without requiring a manual refresh. 9. Feedback Recording Every interaction—or lack thereof—is recorded. If a user clicks, scrolls past, dismisses, or reports a card, that data is sent back to Google’s servers to refine the pCTR model for the next session. The Critical Role of Publisher-Level Blocks One of the most significant findings in recent research is the power of the publisher block. In the Discover interface, users have the option to suppress content from an entire domain. According to the SDK analysis, this block occurs very early in the pipeline. Unlike Google Search, where a site might rank lower but still appear for specific queries, a block in Discover is binary. If a user blocks a publisher, that domain effectively ceases to exist for that user’s feed. Furthermore, the research notes that there is no equivalent “sitewide boost” mechanism. While you can be suppressed instantly at a domain level, you must earn your way into every single user’s feed through individual interest matching and engagement. This asymmetry makes brand trust and user experience paramount. If a publisher relies on “clickbait” titles that lead to low-quality content, they risk a permanent exclusion from a user’s ecosystem that no amount of SEO optimization can fix. The Predicted Click-Through Rate (pCTR) Model Ranking in Discover is largely governed by a pCTR model. Because Discover is a visual medium, Google’s servers must predict engagement before the content is even shown. While the exact weights of the model are proprietary, the research identified several key signals that the app sends to Google to inform these decisions: Title and Meta Tag Integrity The system primarily looks for the “og:title” tag. If this is missing, it cascades to secondary options like the “twitter:title” or the standard HTML title tag. A clear, compelling title that accurately reflects the content is essential for a high pCTR. Image Quality and Dimensions Visuals are perhaps the most important factor for Discover success. To qualify for large, high-engagement cards, images must be at least 1200px wide. The research confirmed that smaller images are often relegated to thumbnail views, which naturally receive lower click-through rates and, consequently, lower priority in the ranking model. Historical Performance Data The model considers the past click and impression data for the specific URL. If a piece of content starts strong and maintains a high engagement rate, the system will continue to “push” it to wider audiences. Conversely, if initial engagement is low, the content is quickly cycled out. The Science of Content Freshness and Decay Google Discover is heavily biased toward

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline Google Discover has evolved into one of the most significant drivers of organic traffic for publishers, often rivaling or even surpassing traditional search results. However, for many digital strategists, it remains a “black box”—an unpredictable engine that grants massive traffic spikes one day and total silence the next. Recent SDK-level research by Metehan Yesilyurt has finally shed light on the internal mechanics of this system, revealing a structured, nine-stage pipeline that dictates how content is qualified, ranked, and occasionally filtered out entirely. Unlike Google Search, which relies on user queries to pull relevant information, Discover is a proactive “push” system. It anticipates what a user might want to see based on their interests, browsing history, and engagement patterns. The research indicates that this process is far more mechanical and filtered than previously thought, involving strict technical prerequisites and real-time feedback loops. The Nine Stages of Content Delivery The journey from a published article to a user’s Discover feed involves a complex series of checkpoints. If a piece of content fails at any of these stages, it is discarded before it even has a chance to compete for a spot in the feed. 1. Crawling and Semantic Understanding The process begins with Google’s standard crawling infrastructure. Before a story can appear in Discover, Google must first discover the URL and parse its content. During this phase, the system identifies the core topic, the entities mentioned (people, places, brands), and the overall sentiment of the piece. 2. Meta Tag Extraction Google Discover relies heavily on specific metadata to build its visual cards. The system looks for Open Graph (OG) tags, specifically og:title and og:image. This stage is critical because it determines the visual “packaging” of your content. If these tags are missing, the system looks for fallbacks, such as Twitter card tags or the standard HTML title tag. 3. Content Classification At this stage, the system assigns the content to specific categories. Is this a piece of breaking news? Is it evergreen lifestyle content? Is it a product review? This classification helps the algorithm match the content with the appropriate audience segments and determines which “freshness” rules apply to the article. 4. Filtering and Block Checks One of the most significant findings of the recent research is that publisher-level blocks happen very early in the pipeline. If a user has previously selected “Don’t show content from this site,” that publisher is effectively dead to that user. This filter is applied before any interest matching or ranking occurs, meaning no amount of “high-quality content” can overcome a manual block. 5. Interest Matching Google compares the classified content against the user’s “Interest Graph.” This graph is built from a user’s search history, YouTube watch patterns, and previous interactions within Discover. The goal is to find a conceptual overlap between what the publisher has written and what the user has historically enjoyed. 6. Predicted Click-Through Rate (pCTR) Modeling Before the feed is rendered, Google runs a server-side model to predict the likelihood of a user clicking on a specific card. This pCTR model factors in the historical performance of the URL, the domain’s reputation, and how similar users have interacted with the content. 7. Feed Layout Construction Google Discover doesn’t just list articles; it builds a visual experience. This stage determines whether an article gets a “large card” (with a full-width image) or a “small card” (with a thumbnail). The layout is influenced by the quality of the image provided and the predicted importance of the story. 8. Content Delivery The content is finally pushed to the user’s device. This happens through the Google App on iOS and Android, as well as the mobile home screen on many Android devices. 9. Feedback Recording The pipeline doesn’t end when the user sees the card. Every action—clicking, dismissing, sharing, or ignoring—is recorded and fed back into the system to refine future ranking and matching. The Power of the Publisher Block A critical takeaway from the research is the asymmetry between positive and negative signals. While there are many ways for a user to “suppress” a site, there are very few ways for them to “boost” it with equal force. When a user selects “Don’t show stories from [Site Name],” it creates a hard filter at the server level. This is not a temporary demotion; it is a permanent exclusion for that user. Interestingly, there is no sitewide “Always show stories from this site” button that carries the same weight. While a user can “Follow” a topic, the “Block” function remains the most powerful tool in the user’s arsenal, making it vital for publishers to avoid “clickbaity” or polarizing content that might trigger a manual block. Technical Gatekeepers: Large Images and Meta Tags The research confirms that Discover has strict technical requirements that act as gatekeepers. If your site does not meet these standards, it may be filtered out of the most lucrative “large card” positions. The 1200px Image Requirement To qualify for the high-engagement large cards, images must be at least 1200 pixels wide. This is not just a suggestion; it is a technical threshold. If an image is smaller, Google may still show the content, but it will likely be relegated to a small thumbnail card, which statistically receives significantly lower click-through rates. Furthermore, if an image fails to load or returns a 404 error, the entire card is usually pulled from the pipeline. The Danger of “No-Go” Tags Certain meta tags can act as an accidental “off switch” for Google Discover. Specifically, the tags “nopagereadaloud” and “notranslate” were found to interfere with the Discover pipeline. While these tags are often used for accessibility or technical reasons on specific page types, their presence can signal to the Discover algorithm that the page is not suitable for the standard feed experience, leading to exclusion. The Freshness Decay: A Race Against Time Freshness is perhaps the most influential factor in Discover ranking. The research identified four distinct windows of visibility that determine

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Google Discover Pipeline Google Discover has long been considered a “black box” for digital publishers and SEO professionals. Unlike traditional search, which relies on active queries, Discover is a push-based system that delivers content to users based on their interests, browsing history, and behavioral patterns. Because of this, traffic from Discover can be massive yet frustratingly volatile. Recent SDK-level research conducted by Metehan Yesilyurt has shed new light on the inner workings of the Google Discover framework. By analyzing the observable signals within the app’s infrastructure, Yesilyurt mapped out a complex, nine-stage pipeline that dictates how content is qualified, ranked, and occasionally filtered out before it ever reaches a user’s screen. For publishers looking to stabilize their traffic, understanding this architecture is essential. The Nine Stages of Google Discover Content Processing The journey from a published article to a prominent spot in a user’s Discover feed involves several sophisticated layers of filtering and evaluation. The research identifies a structured flow that every piece of content must navigate: 1. Crawling and Understanding The process begins with Google’s standard crawling mechanism. Googlebot must be able to access and index the page. During this stage, Google’s systems analyze the semantic meaning of the content, determining the primary topics, entities, and categories the article covers. 2. Meta Tag Extraction Google Discover relies heavily on structured metadata. The system specifically scans for Open Graph tags and other key identifiers to determine how the content should be presented visually. If these tags are missing or malformed, the content may be disqualified from high-visibility formats. 3. Content Classification Google classifies content into specific types, such as “Breaking News,” “Evergreen,” or “Special Interest.” This classification dictates the lifespan of the content within the feed and how aggressively it is pushed to various user segments. 4. The Publisher Block Check One of the most critical findings of the research is the existence of a hard “block” stage. Before the system even considers if a piece of content is interesting to a user, it checks if that user has previously blocked the publisher. If a user has selected “Don’t show stories from [Publisher],” the content is instantly discarded from the candidate pool. 5. Interest Matching This is the personalization layer. Google matches the classified topics of the article against the user’s “Interest Graph.” This graph is built from search history, app usage, and explicit “follows” within the Google ecosystem. 6. Predicted Click-Through Rate (pCTR) Modeling Before the feed is rendered, Google runs a server-side prediction model. This model estimates the likelihood of a specific user clicking on a specific card. High pCTR scores move content higher in the priority list, while low scores may result in the content being buried or omitted. 7. Feed Layout Construction Google doesn’t just list articles; it builds a visual experience. This stage determines whether an article gets a large, high-impact card or a smaller thumbnail, based on the quality of the assets and the predicted engagement. 8. Content Delivery The content is pushed to the user’s device. This happens dynamically, and as the research shows, the feed can update in real-time without the user needing to manually refresh the app. 9. Feedback Recording The cycle closes with user feedback. Every click, scroll-past, dismissal, or “heart” is recorded and fed back into the ranking model to refine future recommendations. The Power of the Publisher Block For many publishers, the most startling revelation in the research is the mechanics of the publisher-level block. In Google Discover, a user has the option to stop seeing content from a specific domain entirely. According to the SDK analysis, this block occurs very early in the pipeline—before interest matching and before ranking. Unlike Google Search, where a site might rank lower but still appear for specific queries, a block in Discover is a total suppression of the domain for that specific user. There is currently no equivalent “sitewide boost” mechanism that functions with the same level of permanence. This means that maintaining a positive reputation with your audience is vital; a few pieces of “clickbait” that annoy users into blocking your domain can permanently erode your Discover reach. The Predicted Click-Through Rate (pCTR) Model While SEOs often focus on keywords, Discover ranking is heavily influenced by a predicted click-through rate (pCTR) model. This model is housed on Google’s servers and acts as a gatekeeper for visibility. While the model’s exact weights are proprietary, the research highlights the signals sent to Google to inform these predictions: Page Title: Primarily pulled from the og:title tag. Image Quality: The system checks if the image is high-resolution and if it loads correctly. Freshness: The time elapsed since publication. Historical Performance: Past click and impression data for that specific URL and domain. Asset Integrity: Whether the images and meta tags are technically sound. This explains why two articles on the same topic might have vastly different performance metrics. If Google’s model predicts that Article A will garner a 10% CTR and Article B will garner 2%, Article A will receive the lion’s share of impressions. Freshness and the Lifecycle of a Discover Post Timing is everything in Google Discover. The research confirms that Google utilizes “freshness decay” to ensure the feed stays relevant. The visibility of content typically follows a specific window of decay: 1 to 7 Days: The Peak Performance Window New content receives the strongest boost. Most “viral” Discover traffic occurs within the first 48 to 72 hours of publication. During this time, the freshness signal is at its strongest, allowing the content to reach the widest possible audience. 8 to 14 Days: Moderate Visibility After the first week, content begins to see a significant drop in impressions unless it is consistently achieving an exceptionally high CTR. At this stage, it is often relegated to “Suggested for You” sections rather than the primary “Top Stories” area. 15 to 30 Days: Limited Visibility By the third week, content visibility becomes highly restricted. Only content that has been classified as high-value evergreen

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

Understanding the LLM Traffic Landscape: A 13-Month Deep Dive The digital marketing world has undergone a seismic shift over the last two years. As Large Language Models (LLMs) like ChatGPT, Claude, Gemini, and Perplexity move from experimental curiosities to daily utility tools, brands are increasingly concerned about how these platforms impact their web traffic. The central question has shifted from “Will AI change search?” to “How much traffic am I actually getting from AI, and does it matter for my bottom line?” To answer these questions with precision, we analyzed a comprehensive dataset of LLM prompt referral traffic captured through Google Analytics. This study spans 13 months, from January 1, 2025, to February 7, 2026. By tracking how users navigate from an AI-generated response to a brand’s website, we can finally move past speculation and look at the hard numbers regarding growth, source attribution, and conversion performance. The findings suggest a complex reality: while the volume of traffic coming from AI models is currently a small fraction of the total digital ecosystem, its growth trajectory and the quality of the leads it generates are unparalleled. For forward-thinking marketers, the data reveals four major trends that should dictate digital strategy throughout 2026. Finding 1: LLM Referral Traffic Remains a Small Piece of the Pie One of the most grounding realizations from our 13-month data set is that LLM referral traffic is still in its infancy regarding raw volume. Despite the massive media coverage surrounding “AI Search,” it currently accounts for less than 2% of total referral traffic on average. To put this into perspective, for every 100 visitors who arrive at a website via a link from another site, fewer than two are coming from an LLM. Across our customer base, the specific range of referral traffic coming from AI sources fluctuates between 0.15% and 1.5%. This includes traffic from established players like ChatGPT and Gemini, as well as search-centric models like Perplexity and research-heavy tools like Claude. For many enterprise businesses, this means that AI-driven traffic is not yet a primary driver of top-line volume when compared to traditional organic search (SEO), direct traffic, or paid social media. However, dismissing these numbers as insignificant would be a strategic error. In the early days of mobile search or social media marketing, the initial referral percentages were similarly low. The importance of this 2% lies not in its current scale, but in its role as a “canary in the coal mine” for shifting consumer behaviors. While it may not be the highest priority for immediate bottom-line impact today, it represents the fastest-evolving segment of the traffic landscape. Finding 2: The Velocity of Growth is Accelerating While the volume is currently low, the growth rate is staggering. Our data shows that LLM referral traffic is not just increasing; it is accelerating. When comparing the first half of 2025 to the second half, the average growth rate for LLM referrals reached 80%. The variance across different industries and brands is also notable. While some companies in traditional sectors saw a modest 10% increase, others—particularly those in tech, education, and research-heavy niches—experienced growth as high as 300%. This suggests that as LLMs become more capable of providing real-time information and citing sources, users are becoming more comfortable clicking through to verify information or complete a transaction. Looking at the aggregate monthly data from 2025, we observed a steady, compounding increase month-over-month. By December 2025, referral traffic from AI models had tripled compared to January 2025. This growth is driven by two primary factors: Consumer Adoption and Habit Formation More users are starting their information journey inside an LLM interface rather than a traditional search engine. As these models become integrated into operating systems (like Apple Intelligence or Windows Copilot), the friction between asking a question and receiving a cited link continues to decrease. Algorithm Evolution The AI models themselves are changing. Throughout 2025, we saw significant updates to how ChatGPT and Gemini handle citations. Models are becoming better at “Retrieval-Augmented Generation” (RAG), which involves looking up live web data to answer a prompt. As these models get better at finding and citing the right pages, the likelihood of a user clicking a referral link increases. Marketers must look beyond the current volume and focus on velocity. A channel that triples in size over 12 months is a channel that will likely command a double-digit share of traffic within the next two to three years. Finding 3: The Shift in Cited Sources (The Rise of YouTube and Reddit) Perhaps the most actionable finding in our data is the shift in which sources LLMs choose to cite. An LLM’s “referral power” is entirely dependent on its citation engine. If your brand is not being cited, you cannot receive traffic. Our monitoring of over 5,000 prompts across Gemini, ChatGPT, and Perplexity shows that the “source of truth” for AI is moving toward community-driven and visual content. Over the last few months of the study, we noticed a significant spike in citations leading to YouTube and Reddit. In the final 30 days of our data set (early 2026), YouTube links in AI responses grew substantially. This is likely due to the models’ increasing ability to process video transcripts and the high authority of video content for “how-to” and “review” style queries. Reddit also saw a massive surge in visibility within AI responses throughout late 2025, though that growth has recently started to level off into a stable plateau. This indicates that LLMs are prioritizing “human-first” perspectives—real reviews, forum discussions, and experiential advice—over traditional, highly optimized SEO blog posts that may feel over-engineered. For brands, this shift means that an “AI visibility” strategy cannot rely on website content alone. To be cited by an LLM, your brand needs a presence where the LLM is looking. This includes: Optimizing video descriptions and transcripts on YouTube. Participating in relevant community discussions on platforms like Reddit. Ensuring that third-party review sites and authoritative news outlets are covering your products. Without monitoring these

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

Understanding the Mechanics of Google Discover For many digital publishers and SEO professionals, Google Discover remains one of the most significant yet unpredictable sources of organic traffic. Unlike traditional search, which relies on active user queries, Discover is a highly personalized “query-less” feed that anticipates user needs based on their interests and past behavior. However, the exact mechanics of how content qualifies for this feed, how it is ranked, and why it is sometimes filtered out have largely been a matter of speculation—until now. Recent SDK-level research conducted by Metehan Yesilyurt has provided a rare look under the hood of the Google Discover architecture. By analyzing the observable signals within the Google app framework, the research reveals a complex, nine-stage pipeline that dictates the lifecycle of a piece of content within the feed. This research confirms that Discover is not just a simplified version of Google Search; it is a distinct ecosystem with its own set of rules, technical requirements, and “hard” filters that can make or break a publisher’s visibility. The Nine-Stage Pipeline: How Content Moves Through Discover The journey from a published article to a user’s mobile feed involves a structured sequence of events. Understanding these stages is critical for diagnosing why certain content performs well while other pieces fail to gain traction. According to the research, the pipeline follows these steps: 1. Crawling and Content Understanding The process begins with Google’s standard crawling mechanisms. Before content can even be considered for Discover, the automated systems must index the page and understand its topical relevance. This stage leverages Google’s large-scale language models to categorize the content into specific “interest clusters.” 2. Meta Tag Extraction Once crawled, the system specifically looks for structured data and meta tags. This is a technical checkpoint where Google identifies the primary visual and textual elements that will represent the article in the feed. This includes the Open Graph (og) tags that define titles and images. 3. Content Classification The system then classifies the content type. Is it a breaking news story, an evergreen guide, or a localized update? This classification determines which “bucket” the content falls into and influences the “freshness” decay model that will be applied later. 4. The Publisher Block Check In one of the most critical stages, Google checks for publisher-level blocks. If a user has previously indicated they do not want to see content from a specific domain, that content is immediately discarded from that user’s potential feed. This happens before any ranking or interest matching takes place. 5. Interest Matching Google compares the content’s topic against the user’s individual interest profile. This profile is built from a massive array of signals, including search history, YouTube activity, and previous interactions within the Discover feed itself. 6. Predicted Click-Through Rate (pCTR) Modeling Before the feed is rendered, Google runs a server-side prediction model. It estimates the likelihood of a user clicking on a specific card based on historical data from that URL, the publisher’s reputation, and the visual appeal of the card layout. This is where “ranking” truly begins. 7. Feed Layout Construction The system decides how to present the qualified content. It balances different types of media, such as standard articles, YouTube videos, and “Shorts,” to create a visually diverse and engaging feed layout. 8. Content Delivery The finalized cards are pushed to the user’s device. This delivery is often dynamic; the research indicates that the feed can update in real-time as a user scrolls, adding or reordering content without requiring a manual refresh. 9. Feedback Recording The cycle closes with the user’s reaction. Whether a user clicks, dismisses, or follows a topic, every action is fed back into the system to refine future ranking decisions for that user and the content itself. Technical Prerequisites: The Role of Meta Tags and Images One of the standout findings of Yesilyurt’s research is the absolute necessity of specific page-level meta tags. Google Discover heavily relies on Open Graph (OG) tags to build the visual “cards” that users see. If these tags are missing or improperly configured, the content may be disqualified entirely. The research identified six key tags that Google Discover prioritizes, with og:title and og:image being the most vital. If an article lacks an image tag, it simply will not appear; there is no such thing as a “text-only” card in the current Discover framework. Furthermore, Google has a hierarchy of backups. If the og:title is missing, it will attempt to use the Twitter title tag or the standard HTML title tag. However, relying on these backups can lead to suboptimal presentation in the feed. The 1200px Rule for Large Cards Visual prominence is a major factor in Discover success. For a piece of content to qualify for the high-engagement “large card” format, the featured image must be at least 1200 pixels wide. Images smaller than this are relegated to small thumbnail layouts. Data shows that large cards generally achieve significantly higher click-through rates, making the 1200px threshold a mandatory technical requirement for any publisher looking to maximize Discover traffic. Tags That Can Kill Your Traffic While some tags help you get in, others are designed to keep you out. The research highlighted two specific meta tags that act as “poison pills” for Discover: nopagereadaloud and notranslate. If these tags are detected, the system may exclude the page from the Discover pipeline entirely. Publishers should audit their CMS and SEO plugins to ensure these tags aren’t being added unintentionally, especially on mobile-optimized versions of their pages. Ranking Factors: The pCTR Model and Historical Performance Unlike traditional search, where “backlinks” and “keyword density” are king, Discover ranking is driven by a predicted Click-Through Rate (pCTR) model. This model is housed on Google’s servers and evaluates content based on its potential to engage the user before the user even sees it. The pCTR model analyzes several observable signals: Title Clarity: Is the title engaging without being “clickbaity” in a way that violates Google’s policies? Image Quality: Does the image load correctly,

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

Understanding the Shift in Digital Referrals The digital marketing landscape is currently undergoing its most significant transformation since the advent of mobile search. As Large Language Models (LLMs) like ChatGPT, Claude, Gemini, and Perplexity become integrated into the daily workflows of millions of users, the question for brands and SEO professionals has shifted from “Will AI impact my traffic?” to “How is AI already impacting my traffic?” To provide a definitive answer to this question, a comprehensive analysis was conducted on a dataset spanning 13 months, from January 1, 2025, to February 7, 2026. By examining Google Analytics referral data across a diverse customer base, we can now see the tangible effects of LLM prompt referrals on brand visibility and business outcomes. This data offers a rare glimpse into the early stages of what many call the “AI Search Era,” revealing a landscape defined by low volume but exceptionally high value. The findings provide a roadmap for digital strategists. While the total volume of traffic arriving via LLMs remains a fraction of traditional search, the growth trajectory and the quality of that traffic suggest that we are witnessing the birth of a powerhouse referral channel. Below, we break down the four major findings from this 13-month study and explore what they mean for the future of digital publishing and lead generation. Finding 1: LLM Referral Traffic is Still Small Despite the immense hype surrounding AI and the perceived threat to traditional search engines, the data shows that LLM referral traffic is still in its infancy. Currently, LLM referrals account for less than 2% of total referral traffic on average. For most brands, this means that fewer than two out of every 100 visitors come from an AI-driven source. The study found a narrow range of 0.15% to 1.5% across various platforms, including OpenAI’s ChatGPT, Perplexity AI, Google’s Gemini, and Anthropic’s Claude. This suggests that while consumers are using these tools to find information, they are not always clicking through to the source material. This phenomenon is often referred to as “zero-click” behavior, where the LLM provides a sufficient answer within the chat interface, satisfying the user’s intent without requiring a visit to an external website. The Context of Small Volume For marketing departments, this low volume provides a much-needed perspective. While AI search optimization (often called GEO or Generative Engine Optimization) is a critical long-term strategy, it should not yet cannibalize the budgets reserved for high-volume channels like organic search (SEO) or paid search (PPC). Traditional SEO still drives the vast majority of web traffic, and maintaining visibility in standard SERPs remains the highest priority for near-term bottom-line impact. However, the small volume does not equate to insignificance. In the early 2010s, mobile traffic was also a “small” percentage of total web visits. Those who ignored it were eventually left behind. The current data suggests we are in a similar “quiet before the storm” phase for LLM traffic. Finding 2: LLM Traffic is Growing Fast While the current volume is small, the rate of growth is staggering. The data reveals that between the first half of 2025 and the second half of the year, LLM referral traffic grew by an average of 80%. When looking at the aggregate data from January 2025 to December 2025, referral traffic from these sources tripled. This growth is not uniform across all industries or brands. Some companies in the dataset saw modest growth of 10%, while others experienced explosive 300% increases in AI-referred visits. This variance often depends on the type of content a brand produces and how “referenceable” that content is for an AI model looking for authoritative answers. The Velocity Factor The most important metric for brands to track right now isn’t total volume; it’s velocity. The steady month-over-month increase indicates that consumer habits are shifting. As LLMs become more integrated into browsers (like SearchGPT features or Gemini in Chrome) and mobile operating systems, the friction between asking a question and visiting a cited source is decreasing. Marketers need to monitor how quickly their specific niche is being adopted by AI users. If your LLM referral traffic is doubling every quarter, it signals that your target audience is moving away from traditional keyword-based searching and toward conversational discovery. This velocity is a leading indicator of where your future customers will be found. Finding 3: Sources Referenced in Responses are Shifting One of the most dynamic aspects of the last 13 months has been the change in which sources LLMs choose to cite. The AI models are not static; their training data, retrieval-augmented generation (RAG) processes, and real-time search algorithms are constantly being tweaked by developers at OpenAI, Google, and Meta. According to data monitoring over 5,000 prompts across various LLM APIs since September 2025, there has been a notable shift in the “authority” landscape. Two platforms, in particular, have seen significant movement: YouTube and Reddit. The Rise of Video and Community Citations Over the last 30 days of the study, YouTube links and citations within LLM responses have seen a marked increase. This is likely due to the improved multimodal capabilities of models like Gemini and GPT-4o, which can now “watch” or transcribe video content to find specific answers. If an LLM can cite a specific timestamp in a video that answers a user’s question, it is increasingly likely to do so. Similarly, Reddit saw massive growth in citations throughout 2025, though this traffic leveled off toward the beginning of 2026. This reflects the AI companies’ efforts to tap into “human-verified” information and community discussions to provide more nuanced, less clinical answers. For brands, this means that an LLM-friendly strategy must extend beyond their own website. Your presence on third-party platforms like Reddit and YouTube now directly influences your visibility in AI chat responses. The Need for Third-Party Monitoring Unlike traditional search engines, LLMs do not provide a “Search Console” that shows which queries you ranked for or which responses you were cited in. This information is currently only accessible through

Scroll to Top