Uncategorized – Page 94 – bestseoserviceinusa.com

How Google Discover qualifies, ranks, and filters content: Research

aftabkhannewemail@gmail.com / February 25, 2026

Google Discover has long been one of the most mysterious and volatile sources of traffic for digital publishers. Unlike traditional Google Search, which relies on a user entering a specific query, Discover is a proactive, “query-less” feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many news organizations and tech blogs, a single hit in Discover can result in hundreds of thousands of visits in a matter of hours, yet the mechanisms behind how content is selected have largely remained a black box. Recent SDK-level research by Metehan Yesilyurt has finally pulled back the curtain on this system. By analyzing the observable signals within Google’s Discover app framework and telemetry data, Yesilyurt mapped out the intricate pipeline that content must navigate before it ever reaches a user’s screen. This research reveals a structured, nine-stage flow governed by strict technical requirements, predictive modeling, and aggressive filtering. Understanding this pipeline is no longer optional for SEOs; it is the blueprint for survival in a push-based content economy. The Nine-Stage Google Discover Pipeline The journey from a published article to a Discover card is not a simple linear path. Instead, it is a high-speed filtering process designed to eliminate low-quality or irrelevant content as early as possible. According to the research, the process can be broken down into nine distinct phases. First, Google must crawl and understand the content. This is the foundation of all Google products, but in Discover, the emphasis is heavily placed on semantic understanding and classification. Once crawled, the system moves to metadata extraction, where it specifically looks for key tags like the image and title. Following this, the content is classified into categories, such as “breaking news” or “evergreen,” which dictates how the system handles its “freshness” decay later on. The fourth stage is perhaps the most critical for publishers: the block check. Before any ranking or interest matching occurs, the system checks if the user or the platform has blocked the publisher. If a user has previously selected “Don’t show content from this site,” the content is discarded immediately. If the content survives the block check, it moves to interest matching, where Google’s vectors map the article’s topics to the user’s documented interests. The final stages involve server-side predictive modeling (pCTR), feed layout construction, content delivery, and the recording of user feedback. This feedback loop is continuous; if a user engages with the content, it reinforces the publisher’s standing. If they dismiss it, the system learns to suppress similar content in the future. The Power of the Publisher Block One of the most striking findings of the research is the hierarchy of filtering. Many SEOs believe that ranking factors like page speed or keyword density are the primary drivers of visibility. However, the research shows that publisher-level blocks happen long before the ranking engine even looks at your content. The “Don’t show content from this site” action is an incredibly powerful tool in the user’s hands. When a user blocks a domain, that content is suppressed across the board for that individual. There is currently no equivalent “sitewide boost” mechanism that rewards a publisher as aggressively as a block punishes them. This creates a high-stakes environment where a single piece of misleading or “clickbaity” content can lead to a permanent loss of a potential reader’s entire lifetime value on the platform. This “hard block” logic underscores the importance of brand trust. In Discover, you aren’t just competing for a click; you are competing to remain in the user’s ecosystem. If your content consistently fails to deliver on the promise of its headline, users will eventually exercise their right to block your domain, effectively “de-indexing” you from their personal feed. The Ranking Model: pCTR and Server-Side Logic Once an article passes the initial eligibility and block filters, it enters the ranking phase. The research highlights the use of a predicted click-through rate (pCTR) model. This model resides on Google’s servers and estimates the likelihood of a user clicking on a specific card based on several variables. While the internal weights of the pCTR model are not visible to the public, the SDK telemetry shows which signals the app sends to Google’s servers to inform these decisions. These include: The Page Title: Extracted primarily from the Open Graph title tag (og:title). Image Quality and Dimensions: The system checks if the image is large enough and if it has loaded successfully in the past. Content Recency: A “freshness” score is applied based on the publication timestamp. Historical Engagement: Previous click and impression data for that specific URL. Technical Reliability: Signals indicating whether images or snippets are failing to render properly. The pCTR model is dynamic. If an article begins to perform well (i.e., its actual CTR exceeds its predicted CTR), it can “trend,” causing the system to push it to a much wider audience of users with similar interest profiles. Conversely, if a story has a high impression count but very few clicks, the pCTR model will quickly deprioritize it, leading to the “traffic cliff” many publishers experience after a successful run. The Critical Role of Image Requirements In Google Discover, visuals are not just an aesthetic choice; they are a technical requirement. The research confirms that Google Discover reads specific page-level tags, with a heavy reliance on Open Graph metadata. If a page lacks a high-quality image, it is often disqualified from appearing as a prominent card, or it may not appear at all. To qualify for the high-engagement “large card” format, images must be at least 1,200 pixels wide. Content with smaller images is typically relegated to small thumbnail layouts, which have significantly lower click-through rates. Furthermore, the research indicates that Google monitors whether images load successfully. If your site has technical issues like slow-loading images or broken 404 links for metadata images, the system may filter your content out entirely to preserve the user experience of the feed. Publishers should also be aware of fallback mechanisms. If the og:title

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

aftabkhannewemail@gmail.com / February 25, 2026

How Google Discover qualifies, ranks, and filters content: Research Google Discover has become one of the most significant yet enigmatic traffic drivers for modern publishers. Unlike traditional search, which relies on user queries, Discover is a highly personalized feed that pushes content to users based on their interests, browsing history, and behavioral patterns. For many digital media outlets, a single article “going viral” on Discover can result in hundreds of thousands of visits in a matter of hours. However, this traffic is notoriously volatile and unpredictable. Recent SDK-level research conducted by Metehan Yesilyurt has pulled back the curtain on the internal architecture of Google Discover. By analyzing the observable signals within the Google app framework, this research maps out the multi-stage pipeline that dictates how content is qualified, filtered, and eventually ranked for individual users. Understanding this pipeline is essential for SEOs and content creators who want to move beyond guesswork and align their strategies with the technical realities of Google’s recommendation engine. The Nine-Stage Google Discover Pipeline The research identifies a structured, nine-stage flow that every piece of content must navigate before appearing in a user’s feed. This process is highly automated and relies on a combination of real-time classifiers and server-side models. The journey of a URL through Discover looks like this: 1. Crawling and Content Understanding The process begins with Google’s ability to find and index your content. This is not fundamentally different from standard search indexing, but for Discover, the speed of crawling is paramount. Google must understand the core topic and entities within the article almost immediately after publication to determine if it meets the criteria for “fresh” content. 2. Meta Tag Extraction Once the content is crawled, the system extracts critical metadata. This stage focuses heavily on Open Graph tags (og:title and og:image). This information is used to build the visual “card” that the user sees. If these tags are missing or improperly formatted, the content may fail to move to the next stage. 3. Content Classification Google classifies the content type. Is it a breaking news story, a “how-to” guide, or an evergreen piece of long-form journalism? These classifications help the system determine which “bucket” the content belongs in and how long its shelf life should be. 4. Block List Verification This is one of the most critical stages for publishers. Before any interest matching or ranking occurs, the system checks for blocks. If a user has previously indicated they do not want to see content from your domain, your URL is filtered out immediately. There is no opportunity to “out-rank” a publisher-level block. 5. Interest Matching The system attempts to align the content’s topic with the user’s established interests. This is based on the user’s Search history, YouTube activity, and previous interactions within the Discover feed itself. 6. Predicted Click-Through Rate (pCTR) Modeling Google applies a sophisticated, server-side pCTR model. The system evaluates how likely a specific user is to click on your card compared to other available options. This model considers historical engagement data for your domain and the specific URL. 7. Feed Layout Construction At this stage, the system decides how the feed will look. It selects which cards will be “large” (high-quality images) and which will be smaller thumbnails, ensuring a diverse and visually appealing mix of content. 8. Content Delivery The content is finally pushed to the user’s device. This happens in real-time, and the feed can be updated even while the user is actively scrolling. 9. Feedback Recording The final stage is the loop. Every action the user takes—clicking, dismissing, saving, or ignoring—is recorded and fed back into the system to refine future ranking and filtering decisions. The Silent Killers: Why Content Fails to Qualify One of the most striking findings of the research is the existence of “hard blocks” that prevent content from even entering the ranking competition. Many publishers focus on keywords and engagement, but technical oversights can disqualify a page before it ever reaches a user. Two specific meta tags can act as total suppressors: “nopagereadaloud” and “notranslate”. If these tags are detected, the system may interpret the content as restricted or unsuitable for the Discover environment, leading to an automatic exclusion. While these tags have legitimate uses for accessibility or technical reasons, their presence is a red flag for the Discover pipeline. Furthermore, image requirements are non-negotiable. Google Discover is a visual-first medium. To qualify for the large, high-engagement cards that drive the most traffic, images must be at least 1200 pixels wide. The system also requires the setting max-image-preview:large (or the use of AMP) to display these high-resolution visuals. If your images are small or fail to load correctly during the delivery stage, your visibility will be severely limited, often resulting in small thumbnail displays that suffer from significantly lower click-through rates. The Power of the Publisher-Level Block The research highlights a sobering reality for publishers: the “Don’t show content from this site” action is incredibly powerful. Because this block happens at the fourth stage of the pipeline—long before ranking models are applied—it acts as a permanent barrier between your domain and that specific user. Currently, there is no equivalent “sitewide boost” mechanism. While a user can “follow” a publisher, the research suggests that a single negative action (a dismissal or a block) carries more weight in the filtering process than a single positive action. This creates a high-stakes environment where clickbait or misleading titles might drive short-term clicks but result in long-term domain suppression if users feel deceived and choose to block the source. The Freshness Decay: Understanding the Visibility Windows Time is the most influential factor in Discover visibility. Unlike traditional search, where a high-quality guide can remain at the top of the SERPs for years, Discover content has a distinct and rapid decay cycle. The research identifies four primary windows of visibility: 1 to 7 days: This is the “golden window.” Freshly published content receives the strongest boost and the highest likelihood of appearing in the top positions

Uncategorized

What 13 months of data reveals about LLM traffic, growth, and conversions

aftabkhannewemail@gmail.com / February 25, 2026

The digital landscape is currently navigating one of the most significant shifts since the inception of the commercial internet. For over two decades, search engine optimization (SEO) has been the primary vehicle for organic growth, centered almost entirely on Google’s ranking algorithms. However, the rise of Large Language Models (LLMs) like ChatGPT, Claude, Gemini, and Perplexity has introduced a new variable into the equation: AI-driven referral traffic. As marketing teams and business owners look toward the future, the primary question has shifted from “Will AI affect my traffic?” to “How is AI currently affecting my traffic, and how do I optimize for it?” To answer this, we analyzed a comprehensive dataset covering 13 months of LLM prompt referral traffic across a diverse customer base. This study, spanning from January 1, 2025, to February 7, 2026, provides a data-driven look at how users are transitioning from traditional search to AI-guided discovery. The findings suggest that while we are still in the early stages of this transition, the characteristics of LLM traffic are fundamentally different from traditional search traffic. Understanding these nuances—specifically regarding growth velocity and conversion intent—is critical for any brand looking to maintain a competitive edge in 2026 and beyond. The Current State of LLM Referral Traffic: Small but Significant One of the most grounding realizations from the last 13 months of data is that LLM referral traffic remains a relatively small portion of the overall traffic mix. According to our dataset, LLM referrals account for less than 2% of total referral traffic on average. To put that into perspective, for every 100 visitors who reach a website via a referring link, fewer than two are coming directly from an LLM interface. The data shows a range of 0.15% to 1.5% across different industries and brand sizes. This includes traffic from major players such as OpenAI’s ChatGPT, Perplexity AI, Google’s Gemini, and Anthropic’s Claude. For many businesses, this suggests that while LLMs are a hot topic in the boardroom, they may not yet be the highest priority for immediate, bottom-line impact compared to established channels like organic search, social media, or paid advertising. However, focusing solely on the “small” percentage misses the broader strategic point. Traditional referral traffic often comes from stagnant links on blogs or directories. LLM traffic, by contrast, is dynamic. It represents a user who is actively engaged in a conversation and has been directed to a specific brand as a solution to a complex problem. The “small” volume is the tip of an iceberg that is growing at an unprecedented rate. Analyzing the Rapid Growth and Velocity of LLM Referrals While the current volume may be low, the growth trajectory is staggering. When we compared the first half of 2025 (H1) to the second half of the year (H2), the data revealed an average growth rate of 80% in LLM referral traffic. This isn’t just linear growth; it is an acceleration of adoption. Within the dataset, the variance was notable. Some companies saw modest growth of around 10%, likely due to being in “low-intent” or highly regulated industries where LLMs are more cautious with citations. On the other end of the spectrum, some brands experienced a 300% increase in traffic from AI sources. By the end of December 2025, aggregate referral traffic had tripled compared to the numbers seen in January 2025. This tells us that marketers must look beyond volume and start measuring “velocity.” Velocity is the rate at which LLMs are beginning to favor your brand over others. Because LLM algorithms and their “grounding” (the process by which they search the live web) are updated frequently, a brand can see dramatic swings in visibility overnight. Monitoring this velocity allows brands to identify when a specific content strategy has successfully “broken through” into the LLM’s knowledge base. Why is LLM Traffic Growing So Fast? Several factors contribute to this 80% average growth rate. First, consumer behavior is shifting; users are increasingly using LLMs for “pre-purchase” research—comparing products, summarizing reviews, and looking for recommendations. Second, the LLMs themselves have become much better at citing sources. Early iterations of ChatGPT were criticized for “hallucinating” or failing to provide links. The 2025-2026 models are much more focused on transparency, providing clear citations that encourage users to click through to the primary source. The Shifting Landscape of AI Citations: YouTube and Reddit One of the most fascinating revelations from the data involves where LLMs are getting their information. LLMs do not exist in a vacuum; they pull from a massive index of the live web. By monitoring over 5,000 prompts and their subsequent responses across various APIs (Gemini, ChatGPT, Perplexity), we can see exactly which platforms are gaining influence within the AI ecosystem. Over the last several months, there has been a significant shift in citation sources. Two platforms in particular have stood out: YouTube and Reddit. The Rise of Video Citations YouTube links and citations in LLM responses have seen a marked increase. As LLMs become more multimodal—meaning they can “watch” videos and “listen” to audio—they are increasingly referencing video content as a primary source of authority. This is a crucial takeaway for content creators: if you are not producing video content, you may be missing out on a significant portion of the LLM citation market. AI models find video transcripts and visual demonstrations to be highly valuable for answering “how-to” queries and product reviews. The Reddit Plateau Reddit also saw a period of explosive growth as a citation source. Because Reddit is a repository of human-first, experiential data (“What is the best laptop for a student?”), LLMs prioritized it to provide “authentic” answers. However, our data shows that this traffic has recently leveled off. This could be due to changes in how LLMs weigh forum data versus “expert” editorial content, or it could be a result of the platforms themselves tightening their API access. For brands, these shifts are a signal that content strategy cannot be one-dimensional. To be cited by an LLM, your brand needs

Uncategorized

How Google Discover qualifies, ranks, and filters content: Research

aftabkhannewemail@gmail.com / February 25, 2026

Understanding the Google Discover Ecosystem For many digital publishers and SEO professionals, Google Discover remains one of the most significant yet unpredictable drivers of organic traffic. Unlike traditional search, which relies on active queries, Discover is a highly personalized feed that anticipates user needs based on their interests and past behavior. However, the exact mechanics of how content surfaces in this feed have often been shrouded in mystery. Recent SDK-level research by Metehan Yesilyurt has provided a rare, behind-the-curtain look at the internal architecture of Google Discover. By analyzing the data signals and telemetry within the Google app framework, this research maps out a sophisticated nine-stage pipeline that dictates how content is qualified, filtered, and eventually ranked. This discovery is pivotal for anyone looking to stabilize their traffic from a platform known for its extreme volatility. The Nine-Stage Pipeline of Google Discover Google Discover does not simply pick a random article and show it to a user. It follows a structured, multi-stage process that filters out millions of pages before a single card is rendered on a mobile device. Understanding these stages is essential for diagnosing why content may be failing to gain traction. 1. Crawling and Content Understanding: The process begins with Google’s standard crawling infrastructure. Before a page can even be considered for Discover, the system must parse the text, structure, and intent of the content. This is where Google determines what the article is actually about. 2. Meta Tag Extraction: The system specifically looks for Open Graph (OG) tags and other structured data. It prioritizes the og:title and og:image to understand how the content should be presented visually. 3. Content Classification: Content is categorized into specific buckets, such as breaking news, evergreen guides, or niche interest topics. This classification helps the system decide which “decay” model to apply to the content’s visibility. 4. The Block Check: This is a critical gatekeeper. The system checks if the publisher has been blocked by the user or if the domain has been flagged for policy violations. If a block exists, the process stops here. 5. Interest Matching: Google compares the content’s topic clusters against the user’s individual interest profile, which is built from search history, app usage, and location data. 6. pCTR Prediction Model: A server-side model predicts the Click-Through Rate (pCTR). It estimates the likelihood of a specific user clicking on a specific card based on historical performance and visual cues. 7. Feed Layout Construction: The system determines where the card will sit in the feed and whether it will be a large, high-engagement card or a smaller thumbnail. 8. Delivery: The content is pushed to the user’s device. 9. Feedback Loop: The system monitors user interaction. Did they click? Did they dismiss the card? Did they spend time reading? This data is fed back into the model for future ranking decisions. The Pre-Ranking Filter: The Power of Publisher Blocks One of the most significant findings in the recent research is the placement of the publisher-level block in the pipeline. In Google Discover, the decision to filter out a publisher happens before the ranking engine even considers the content’s quality or relevance. This creates a “hard wall” for certain domains. When a user selects “Don’t show content from this site,” it isn’t just a temporary preference; it is a powerful suppression signal. Unlike traditional search, where a user might still see a site they dislike if it is the most relevant result, Discover treats a block as an absolute exclusion. There is currently no equivalent “sitewide boost” mechanism that publishers can trigger to counter these blocks. This makes brand reputation and user trust a literal prerequisite for ranking. If your domain has high “dismissal” rates or has been frequently blocked, your content may be technically eligible and high-quality, but it will never reach the ranking stage because the filter triggers first. Technical Requirements: Images and Meta Tags The visual nature of Google Discover means that technical SEO for this platform is heavily focused on assets rather than just text. The research highlights six key page-level tags that Discover reads, with the Open Graph (OG) tags being the most vital. If a page lacks a valid og:image, it is effectively disqualified from appearing as a card. The 1200px Threshold To qualify for the large, high-performing cards that drive the vast majority of Discover traffic, images must be at least 1200 pixels wide. The system is designed to favor high-resolution visuals. While smaller images may still allow a page to appear as a small thumbnail, these smaller cards consistently earn lower click-through rates and are often deprioritized by the pCTR model. Fallback Mechanisms Interestingly, the Discover architecture is built with several redundancies. If the og:title tag is missing, the system will attempt to pull information from the Twitter title tag or the standard HTML title tag. However, relying on fallbacks is risky, as the system may choose a less-optimized title that fails to entice clicks. The “Kill Switch” Meta Tags The research identified two specific meta tags that act as an accidental “kill switch” for Discover visibility: “nopagereadaloud” and “notranslate.” If these tags are present, they can prevent the page from entering the Discover pipeline entirely. While these tags have legitimate uses in web development, publishers should use them with extreme caution if they rely on Discover for traffic. The Freshness Decay Model Google Discover is fundamentally a “freshness” engine. While evergreen content can and does surface, the system is biased toward newness. The research reveals a specific decay schedule that governs how long a piece of content remains viable in the feed. 1. The Golden Window (1 to 7 days): This is when content receives its strongest boost. If an article is going to “go viral” on Discover, it usually happens within this first week. 2. The Moderate Phase (8 to 14 days): Visibility begins to taper off. Only content with exceptionally high engagement signals stays prominent during this window. 3. The Limited Visibility Phase (15 to 30 days):

Uncategorized

The AI writing tics that hurt engagement: A study

aftabkhannewemail@gmail.com / February 25, 2026

The rise of generative AI has transformed the landscape of digital publishing, but it has also birthed a new era of “vibe-based” editorial criticism. If you spend any time on social media platforms like LinkedIn or X, you have likely seen content marketers and SEOs confidently pointing out the “dead giveaways” of AI-generated text. From the over-reliance on em dashes to the predictable “In today’s fast-paced world” introductions, the consensus seems to be that readers hate AI writing because it feels robotic, repetitive, and uninspired. However, much of this discourse relies on subjective taste rather than hard data. While a seasoned editor might cringe at a specific linguistic pattern, the real question for digital publishers is whether these patterns actually impact performance. Does a reader truly bounce because they saw an em dash, or is the industry over-correcting based on personal pet peeves? To move past the guesswork, a comprehensive study was conducted to analyze which AI writing “tics” actually hurt user engagement and which ones are being unfairly maligned. The Methodology: Quantifying the AI Linguistic Footprint To understand the relationship between stylistic patterns and reader behavior, the study gathered an extensive dataset. Analyzing content purely on a “feel” basis isn’t enough; the research required a standardized approach to identify correlations between specific phrases and Google Analytics 4 (GA4) metrics. The research was built upon the following dataset parameters: 10 Diverse Domains: The study covered a wide spectrum of industries, including technology, e-commerce, healthcare, education, and analytics. This ensured that the findings weren’t limited to a single niche or audience type. Over 1,000 URLs: The URLs represented a mix of content workflows, ranging from fully human-written articles to hybrid human-AI collaborations and completely AI-generated posts. Minimum Word Count: Any page under 500 words was excluded. Very short posts do not provide enough linguistic real estate for stylistic patterns to emerge reliably, and their engagement metrics are often skewed by quick-answer search intent. To ensure a fair comparison, the researchers standardized the data by measuring “tics per 1,000 words.” Without this normalization, a 4,000-word deep dive would look significantly “worse” than a 600-word blog post simply because it contains more sentences. The primary metric for success was the engagement rate. In the world of GA4, an “engaged session” is defined as a visit that lasts 10 seconds or longer, includes a conversion event, or involves at least two page views. While 10 seconds sounds brief, it is the critical window where a reader decides if the content is worth their time or if it’s just another piece of generic digital filler. The Shakespeare Curveball: Why Some “AI Tells” Are Actually Human Before diving into the engagement data, the study uncovered a fascinating paradox. Many of the linguistic patterns we associate with Large Language Models (LLMs) are deeply rooted in high-quality human prose. To test the validity of AI “tic” counters, the researchers ran the analysis against two control samples that were guaranteed to be 100% human-written. The first control was a novel published in 2021, written before the widespread availability of tools like ChatGPT. This text scored 6.9 tics per 1,000 words—a score that would trigger many modern AI detectors. Even more surprising was the second control: William Shakespeare’s *Hamlet*. The play scored approximately 11.4 tics per 1,000 words, making the Bard of Avon more “AI-coded” than many modern AI-generated blog posts. This anomaly was largely driven by the em dash. Shakespeare and literary novelists use complex sentence structures that rely on punctuation to manage parenthetical thoughts. Because AI is trained on vast troves of human literature and professional writing, it mimics these structures. This suggests that some of the features we call “AI tics” are actually just hallmarks of formal or complex English. Distinguishing between “bad AI writing” and “sophisticated human writing” requires looking at which specific patterns actually drive users away. The Tics That Kill Engagement: What the Data Reveals The study found that most AI tells have a negligible impact on performance. In statistics, a correlation of less than plus or minus 0.1 is generally considered insignificant. However, a few specific habits showed a clear negative relationship with how long readers stayed on the page. The “Conclusion” Header Kiss of Death The single strongest negative correlation in the entire dataset was the use of the word “Conclusion” as a section header. This tic had a negative correlation of approximately -0.118 with engagement rate. When readers see a header that explicitly says “Conclusion,” they often perceive it as a signal that the value has ended. They scroll past the final paragraphs to find a call to action or simply exit the page immediately. In AI-generated content, LLMs have a habitual need to wrap things up neatly with a summary. These sections often fail to add new information, instead simply restating what was said in the previous 800 words. Readers are savvy; they recognize the “throat-clearing” nature of these sections and bounce before the session can be counted as engaged. The Overuse of “Not Only… But Also” Another significant performance killer was the repetitive use of “Not only [X], but also [Y]” constructions. While this is a grammatically correct way to add emphasis, LLMs tend to use it as a default sentence structure to sound authoritative. The study found that frequent use of this construction correlated with higher bounce rates. In one extreme example found during the study, a single blog post used this phrase 12 times. This level of repetition creates a rhythmic monotony that makes the reader’s eyes glaze over. It signals a lack of original thought and suggests that the content is merely shuffling keywords around rather than providing nuanced insights. Introductory Filler and “The Fast-Paced Landscape” Phrases like “In today’s fast-paced digital landscape,” “Let’s take a look,” or “In this article, we will explore” were also flagged as engagement drains. These are known as “transitional filler.” They take up space without providing immediate value. In a world where users scan content to find answers quickly,

Uncategorized

Anthropic clarifies how Claude bots crawl sites and how to block them

aftabkhannewemail@gmail.com / February 25, 2026

Understanding Anthropic’s Crawler Ecosystem As the landscape of the internet shifts from traditional search engines to AI-driven discovery, the way artificial intelligence companies interact with web content has become a focal point for publishers, SEO professionals, and site owners. Anthropic, the developer behind the Claude AI family, recently released updated documentation to provide much-needed clarity on how its various bots crawl the web. For years, the standard for web crawling was dominated by Googlebot and Bingbot. However, the rise of Large Language Models (LLMs) has introduced a new category of crawlers designed not just for indexing, but for training and real-time data retrieval. Anthropic’s latest update clarifies the distinction between three specific user agents, giving site owners the granular control necessary to decide how their content is used in the age of generative AI. Managing these bots is no longer a niche technical task; it is a fundamental part of a modern digital strategy. Whether you are looking to protect your intellectual property from being used in AI training or you want to ensure your brand remains visible in AI-generated search results, understanding these three distinct bots is essential. The Three Pillars of Claude’s Web Presence Anthropic does not use a one-size-fits-all approach to web crawling. Instead, it employs three separate user agents, each with a specific purpose. Understanding the difference between them is the first step in managing your site’s relationship with Claude. ClaudeBot: The Training Engine ClaudeBot is the primary crawler responsible for gathering public web content to train and improve Anthropic’s generative AI models. When this bot visits your site, it is looking for data that can help the model understand language, facts, and context more effectively. If your primary concern is the use of your copyrighted material or unique data to train future versions of Claude, this is the bot you need to monitor. Anthropic has stated that if you block ClaudeBot in your robots.txt file, the company will exclude your site’s future content from its AI training datasets. This provides a clear path for publishers who want to remain visible on the web but do not want their work contributing to the development of AI models without a formal agreement. Claude-User: The Real-Time Assistant Claude-User operates under a completely different logic than ClaudeBot. This bot is triggered directly by a user’s prompt. For example, if a user provides a specific URL to Claude and asks for a summary or a critique, Claude-User is the agent that fetches that specific page content. Because this bot is “on-demand,” blocking it has immediate consequences for the end user. If you block Claude-User, the AI will be unable to access your pages even when a user explicitly asks it to. This can negatively impact your visibility in user-directed queries and prevent your content from being shared or analyzed within the Claude interface. For many publishers, allowing Claude-User is beneficial as it facilitates direct engagement with their content via the AI assistant. Claude-SearchBot: The Indexer for AI Search As AI companies move further into the search space, indexing becomes a priority. Claude-SearchBot is designed to crawl content to improve the quality and relevance of search results within the Claude ecosystem. This bot functions similarly to a traditional search engine crawler but focuses on optimizing the “answers” Claude provides during search-oriented tasks. Blocking Claude-SearchBot may reduce the likelihood of your content appearing in Claude’s search-driven responses. If your goal is to maintain high visibility and ensure that Claude provides accurate, cited information from your site when answering general search queries, you should generally allow this bot to crawl your pages. Why Granular Control Matters for SEO and Content Strategy The decision to block or allow AI crawlers is not a binary choice. It involves weighing the risks of data scraping against the benefits of referral traffic and brand presence. Protecting Intellectual Property For high-value publishers—such as news organizations, scientific journals, or specialized technical blogs—the data used to train AI is their most valuable asset. By using ClaudeBot as a separate agent, Anthropic allows these publishers to opt out of the training pool while still potentially appearing in real-time search results via the other bots. This distinction is a major step toward a more transparent relationship between AI labs and content creators. Maintaining Visibility in the New Search Era Traditional SEO focuses on ranking in the top 10 blue links of a Google search. However, “AI SEO” or “Generative Engine Optimization” (GEO) focuses on being the cited source in an AI’s summarized answer. To be cited, the AI must be able to see and index your content. If you block all Claude agents, you effectively disappear from the Claude ecosystem, which currently serves millions of users. Technical Implementation: How to Block Claude Bots Anthropic has committed to respecting standard robots.txt directives. This means you do not need complex firewall rules to manage these bots; a simple update to your robots.txt file is usually sufficient. Blocking Specific Bots Across Your Entire Site To block one of the bots entirely, you can use the “Disallow” rule. It is important to remember that you must add a directive for each bot individually if you want to block more than one. To block the training bot: User-agent: ClaudeBot Disallow: / To block the real-time user-request bot: User-agent: Claude-User Disallow: / To block the search indexing bot: User-agent: Claude-SearchBot Disallow: / Using the Crawl-delay Extension If your concern is not the content being used, but rather the load the crawler puts on your server, Anthropic also supports the non-standard “Crawl-delay” directive. This allows you to slow down the frequency of the bot’s visits. User-agent: ClaudeBot Crawl-delay: 5 This is particularly useful for smaller sites or sites with limited hosting resources that might struggle with high-frequency crawling. Applying Rules to Subdomains It is a common technical oversight to apply robots.txt rules only to the main domain. Anthropic has clarified that these directives must be applied to each subdomain individually. If you have a main site at example.com

Uncategorized

How ChatGPT uses SEO to drive growth and revenue

aftabkhannewemail@gmail.com / February 25, 2026

The tech industry is currently obsessed with a singular narrative: generative AI is the “Google killer,” and in the process of replacing traditional search, it is effectively dismantling Search Engine Optimization (SEO) as we know it. However, a closer look at the actual business strategies of the leading AI companies reveals a striking irony. While the world debates the end of the search engine, the architects of artificial intelligence are quietly investing millions of dollars into SEO to secure their own market dominance. OpenAI, the creator of ChatGPT, is not just relying on viral word-of-mouth or social media buzz. Instead, they have integrated sophisticated search marketing strategies into their core growth engine. By analyzing how ChatGPT, Claude, and Perplexity navigate the search landscape, we can uncover a blueprint for how modern digital brands must evolve to survive in an AI-driven world. Far from being dead, SEO has become the primary battlefield for the most advanced technology companies on the planet. The Massive ROI of SEO in the Generative AI Space To understand why OpenAI is doubling down on search, we have to look at the numbers. According to data from Semrush, ChatGPT currently commands a staggering 76.5 million organic monthly visits. To put that in perspective, Perplexity sees roughly 1.7 million, and Anthropic’s Claude attracts about 908,000. When you translate this traffic into potential revenue, the logic for investing in SEO becomes undeniable. If we apply a conservative conversion rate of 0.5%—the percentage of visitors who sign up for a $20 per month “Plus” subscription—the revenue generated from organic search is astronomical. For ChatGPT, this traffic model suggests an annual revenue return of approximately $92 million. Given that an elite SEO team and content strategy might cost the company around $600,000 annually, the return on investment (ROI) sits at a mind-blowing 15,200%. Even for smaller players like Perplexity and Claude, the ROI remains healthy, ranging from 82% to 240%. These figures prove that SEO is not just a “nice-to-have” marketing channel; it is a high-margin revenue driver that allows these companies to scale without solely relying on expensive paid advertising. OpenAI’s Strategic Investment in Human SEO Talent The most telling sign of a company’s strategy is where they put their capital. OpenAI has recently made headlines for its high-stakes hiring in the marketing sector. The company was recently seeking a content strategist with deep SEO experience, offering a salary range between $310,000 and $393,000. Shortly after, they opened another growth role focused specifically on the intersection of SEO, Conversion Rate Optimization (CRO), and overall web strategy. When you account for the high cost of living and benefits associated with tech roles in the United States, it is estimated that OpenAI is investing between $410,000 and $600,000 for just two senior SEO positions. This isn’t the behavior of a company that thinks search is dying. It is the behavior of a company that understands that as the search landscape becomes more fragmented, the ability to be discovered by users is more valuable than ever. This investment is a response to a shifting landscape. While ChatGPT is actually expanding search behavior—driving users to Google to fact-check or research topics discovered in a chat—there has been an overall 20% decline in Google search volume from 2024 to 2025. As Google’s AI Overviews take up more “above-the-fold” real estate, the competition for the remaining clicks is becoming fiercer. OpenAI knows that to stay on top, they must own the technical and content foundations that search engines reward. Evaluating the SEO Foundations: ChatGPT vs. Claude vs. Perplexity A competitive analysis of the “Big Three” AI platforms reveals a wide gap in their search maturity. Domain authority, which measures the “strength” of a website’s backlink profile, is the first major differentiator. ChatGPT currently sits at an Authority Score of 99—nearly a perfect score. Perplexity follows with an 81, and Claude trails at 75. Brand Authority and Demand The brand demand for ChatGPT is unparalleled. The term “ChatGPT” receives 45.5 million searches per month. This massive brand awareness creates a “flywheel effect”: high search volume leads to more news coverage, which leads to more backlinks, which further boosts SEO authority. Perplexity (1 million monthly searches) and Claude (500,000) are still in the early stages of building this kind of brand-driven organic momentum. Keyword Distribution and Rankings When we look at total keyword rankings, the scale of ChatGPT’s lead becomes even clearer: ChatGPT: Approximately 287,800 keywords. Perplexity: Approximately 184,800 keywords. Claude: Approximately 36,000 keywords. ChatGPT’s success here stems from its ability to create a massive “indexable surface area.” By allowing users to share conversations and create public GPTs, they have generated millions of pages of user-generated content (UGC) that search engines can crawl and index. Perplexity has taken a different route, focusing on financial and stock-driven content pages, while Claude utilizes professional-grade blog articles to target high-intent business users. The 3Cs Framework: Code, Content, and Conversions To understand how these companies manage their search visibility, we can apply the “3Cs” framework: Code (technical foundation), Content (strategy and optimization), and Conversions (turning traffic into revenue). 1. Code: The Technical Foundation Technical SEO is the often-ignored backbone of search growth. ChatGPT demonstrates a masterclass in indexability. Their robots.txt file is highly optimized, containing multiple sitemaps and specific instructions that allow major search engine crawlers while blocking smaller, less relevant bots. Interestingly, there is a “cold war” happening in the code; ChatGPT and Claude actually block each other from crawling their respective sites via their robots.txt files. Another critical element is URL structure. Despite some search engines downplaying the importance of keywords in URLs, ChatGPT uses them effectively. When a user shares a chat, the URL often includes descriptive terms that help the page rank. Claude, conversely, often uses cryptic or non-descriptive URLs for its public artifacts. As the saying goes, if you ask a waiter for a “burger,” you get a burger. If you ask for “2387d2e3,” you get a blank stare. AI crawlers and search engines feel the

Uncategorized

How to read Meta Ads metrics like a system, not a scoreboard

aftabkhannewemail@gmail.com / February 25, 2026

How to read Meta Ads metrics like a system, not a scoreboard Every Monday morning, thousands of media buyers and business owners perform a high-stakes ritual. They log into Meta Ads Manager, adjust the date range to the previous seven days, and scan the columns with bated breath. For most, the focus is singular: Return on Ad Spend (ROAS). If the number is green and above the break-even point, the mood is celebratory. If the number has dipped into the red, the reaction is often swift and clinical—the mouse darts toward the toggle button, and the campaign is deactivated. This approach is what industry experts call the “scoreboard trap.” When you treat your advertising metrics like a scoreboard, you are focusing entirely on the final score of the game while ignoring the mechanics of the play. A scoreboard tells you that you lost, but it doesn’t tell you that your strikers failed to receive a single pass from the midfield, or that your defense was playing too high up the field. In the world of Meta advertising, looking only at the “win” or “loss” of a campaign prevents you from understanding the underlying “plumbing” of your marketing funnel. To scale performance in an increasingly competitive digital landscape, advertisers must shift their perspective. You need to move from simple reporting to deep diagnosis. By viewing metrics not as isolated points of data, but as a system of interdependent signals, you can uncover the true story of your account performance and make optimizations that actually drive long-term growth. The dashboard illusion and why it fails advertisers Meta Ads Manager is designed as a linear grid. While this layout is clean and organized, it often creates a false sense of clarity. It implies that each metric exists in a vacuum. You might see a high Cost Per Mille (CPM) in one column and a low Click-Through Rate (CTR) in another, leading you to believe they are two separate problems to be solved independently. In reality, these metrics are deeply intertwined through Meta’s complex auction algorithm. For example, a high CPM is frequently misinterpreted as a sign that an audience is “too expensive” or “too competitive.” While market conditions do play a role, a high CPM is often Meta’s way of taxing a poor user experience. If your creative is low quality, irrelevant, or receives negative feedback from users, Meta’s AI will charge you more to show that ad because it compromises the integrity of the platform’s user experience. Conversely, a high CTR might look like a massive win, but if your Conversion Rate (CVR) is non-existent, you are likely paying for “click-bait” traffic—users who are curious enough to click but have zero intent to purchase. The dashboard tells you what happened; the system tells you why it happened. To master Meta Ads, you must look past the grid and see the machinery behind the numbers. The team metrics framework: Identifying every player’s role One of the most effective ways to understand your Meta Ads account as a system is to think of it as a sports team. Every metric has a specific position and a specific job to do. If the team is losing, you don’t necessarily fire the coach and bench the entire roster. Instead, you analyze the film to see which player isn’t performing their role. This framework allows you to isolate friction points without destroying the parts of your campaign that are actually working. The scouts: CPM and reach In our team analogy, CPM (Cost Per Mille) and Reach are your scouts. Their job is market resonance and talent identification. CPM is the primary feedback mechanism from the Meta auction. It is determined by a combination of your bid, your estimated action rates, and the value you provide to the user. If your CPM spikes significantly above your historical average, your “scouts” are telling you one of two things: either the market has become incredibly crowded (common during Black Friday or election cycles), or your creative is failing to resonate with the audience. When the auction algorithm sees that users are scrolling past your ad without a second glance, it considers your ad “low value” and forces you to pay a premium to stay in the feed. High CPMs are often a creative problem disguised as a targeting problem. The midfielders: CTR and hook rate The midfielders are responsible for ball progression. In Meta Ads, their job is to move the user from the social media ecosystem onto your proprietary website. The primary metrics here are Click-Through Rate (CTR) and Hook Rate (the percentage of people who watched the first three seconds of a video). This is where many “technical leaks” occur. For instance, if you have a high Hook Rate but a very low CTR, your ad is great at grabbing attention (the “hook”) but terrible at “passing the ball.” You’ve stopped the scroll, but you haven’t given the user a compelling reason to take the next step. This suggests that while your visual hook is strong, your value proposition or your Call to Action (CTA) is weak. You are getting the attention, but you aren’t doing anything productive with it. The strikers: CVR and AOV The strikers are your “closers.” Conversion Rate (CVR) and Average Order Value (AOV) represent the final step of the journey. These metrics are heavily dependent on your website, landing page, and offer. If your midfielders (CTR) are doing an amazing job and driving traffic at a low Cost Per Click (CPC), but your ROAS is still abysmal, your strikers are failing to find the back of the net. In this scenario, the problem usually isn’t the ad; it’s the destination. If people are clicking but not buying, there is a disconnect between the promise made in the ad and the reality of the landing page. Perhaps the page loads too slowly, the checkout process is cumbersome, or the price point is too high for the value demonstrated in the creative. Diagnosing system

Uncategorized

Google fixed a serving issue with search results

aftabkhannewemail@gmail.com / February 25, 2026

Google Search Experiences a Brief Technical Disruption In the early hours of Wednesday, February 25th, digital marketers and night-owl webmasters noticed something unusual within the Google Search ecosystem. Reports began to surface of a serving issue affecting search results globally. Google confirmed the incident shortly after, acknowledging that a technical glitch had interfered with the way search engine results pages (SERPs) were delivered to users. The issue, which was detected around 1:30 AM ET, was resolved with uncharacteristic speed, but its brief window of activity serves as a critical reminder of the complexities inherent in modern search infrastructure. While the disruption lasted only about 15 minutes, the ripple effects of any Google Search downtime can be felt across the entire digital landscape. For businesses that rely on organic traffic for leads and sales, even a quarter-hour of “darkness” on the SERPs can lead to measurable dips in real-time analytics. Google’s rapid response and the subsequent update to the Google Search Status Dashboard provided clarity, though many questions remain regarding what exactly happens during a “serving issue” and how site owners should react when the world’s most powerful search engine experiences a hiccup. Understanding the Nature of a Search Serving Issue To understand the significance of this event, it is essential to distinguish between the different phases of Google Search. Typically, Google operates through a three-stage process: crawling, indexing, and serving. A “serving issue” is distinct from an indexing or crawling problem. When crawling fails, Google cannot find new or updated pages. When indexing fails, Google cannot store those pages in its database. However, a serving issue means that while the data exists and is properly indexed, the mechanism that delivers those results to the user’s browser is broken. During the incident on February 25th, users may have encountered empty search results, error messages, or delayed loading times. Because the issue was categorized specifically as a serving error, it implies that Google’s vast network of data centers encountered a bottleneck or a software bug that prevented the retrieval of indexed content. For those 15 minutes, the bridge between Google’s index and the end user was effectively closed. The Timeline of the Event According to the Google Search Status Dashboard, the issue was flagged and addressed in the very early morning hours. Specifically, at approximately 1:30 AM ET, the disruption was at its peak. Google’s engineering teams were quick to identify the root cause, and by the time most of the Western world was waking up, the fix had already been deployed. Google’s official notice stated, “We fixed the issue with serving search results. There will be no more updates.” It is worth noting that while the official logs might show a very tight window between the announcement and the resolution, the actual impact often spans a slightly longer period. In this case, Google confirmed the serving issue lasted approximately 15 minutes. In the world of high-frequency trading, global news cycles, and e-commerce, 15 minutes is an eternity. Millions of queries are processed every minute, and a 15-minute outage represents a staggering amount of missed connections between users and information. Why Webmasters and SEOs Should Care For the average internet user, a 15-minute glitch is a minor inconvenience—perhaps a reason to refresh the page or check their internet connection. However, for SEO professionals and website owners, these incidents are much more significant. If your website noticed a sudden, unexplained drop in organic traffic around midnight or 1:30 AM ET on February 25th, it was likely not a problem with your site’s health or a sudden algorithmic penalty. Instead, it was a direct result of this global serving issue. Data integrity is a cornerstone of professional SEO. When looking at Google Search Console or Google Analytics, a 15-minute gap in data can look like a technical error on the website’s end. Knowing that Google had a confirmed serving issue allows marketers to annotate their reports and explain the variance to stakeholders. It prevents unnecessary troubleshooting of server configurations or site code when the problem was actually external. The Discrepancy Between Dashboard Notices and Real-Time Experience One common point of confusion during Google outages is the timing of the Status Dashboard updates. Often, the dashboard is updated after the engineers have already begun working on the fix, or even after the fix has been implemented. This was observed during the February 25th event, where the notice and the “resolved” status appeared almost simultaneously. This does not mean the issue only existed for one minute. Rather, it reflects the internal protocol Google follows for public communication. Google typically only confirms issues once they have a clear understanding of the scope and a path to resolution. For site owners, this means that real-time monitoring tools (like Rank Ranger, Mozcast, or internal server logs) are often the first line of defense in identifying Google-side errors before they are officially acknowledged. Potential Impact on Search Rankings and Data A frequent concern among site owners is whether a serving issue can have long-term effects on their search rankings. The short answer is generally no. Because a serving issue is a delivery problem on Google’s side, it does not reflect the quality, relevance, or authority of your website. Once the serving pipes are cleared and the SERPs return to normal, your rankings should remain exactly where they were prior to the disruption. However, there are short-term data anomalies to be aware of: 1. Google Search Console Reporting Google Search Console (GSC) data is not real-time; it usually has a lag of several hours to a couple of days. When the data for February 25th finally populates, you may see a slight dip in total impressions and clicks for that day. This dip will be most noticeable for sites that receive heavy traffic during the early morning hours ET or for international sites where 1:30 AM ET correlates with peak daytime hours. 2. Paid Search Implications While this specific issue was focused on organic search results serving, technical glitches in

Uncategorized

Anthropic clarifies how Claude bots crawl sites and how to block them

aftabkhannewemail@gmail.com / February 25, 2026

The relationship between web publishers and artificial intelligence companies has reached a critical turning point. As large language models (LLMs) like Claude become more integrated into daily search and productivity workflows, the demand for high-quality web data has never been higher. Recognizing the need for transparency and creator control, Anthropic has recently updated its official documentation to clarify exactly how its bots interact with the web. This move provides webmasters, SEO professionals, and site owners with the specific tools they need to manage how their content is used—or not used—by Claude. For years, the industry standard for controlling web crawlers was focused primarily on search engines like Google and Bing. However, the rise of generative AI has introduced a new layer of complexity. It is no longer just about appearing in search results; it is about whether your data should be used to train future models or retrieved in real-time to answer a specific user query. Anthropic’s latest update breaks down these functions into three distinct user agents, allowing for granular control that was previously unavailable. The Evolution of the AI-Publisher Relationship Historically, the “deal” between publishers and crawlers was simple: you let a bot crawl your site, and in exchange, that bot indexed your content and sent you traffic. Generative AI has complicated this exchange. When an AI model “learns” from a website, it may provide the information to a user without the user ever needing to click through to the original source. This has led to a significant debate regarding the fair use of data and the future of the open web. Anthropic’s decision to clarify its crawler documentation is a response to these concerns. By identifying different bots for different purposes—training, user-directed retrieval, and search optimization—the company is attempting to give site owners the ability to opt-out of one without necessarily losing visibility in another. This nuance is vital for digital strategy in 2024 and beyond. Understanding Anthropic’s Three Specific Bots Anthropic utilizes three separate user agents to interact with web content. Understanding the distinction between these three is the first step in managing your site’s digital footprint within the Claude ecosystem. 1. ClaudeBot: The Training Engine ClaudeBot is perhaps the most significant agent for those concerned about intellectual property. This bot is responsible for collecting public web content that may be used to train and improve Anthropic’s generative AI models. When ClaudeBot crawls a site, it is looking for data that will help future versions of Claude understand language, facts, and context more effectively. If you are a publisher who believes that your content should not be used to build a commercial AI model without compensation or explicit consent, ClaudeBot is the agent you will likely want to restrict. Anthropic has stated that if you block ClaudeBot in your robots.txt file, the company will exclude your site’s future content from its AI training datasets. It is important to note that this generally applies to future crawls; content already ingested into existing models may not be retroactively removed, but the “opt-out” ensures that your new material remains off-limits for the next generation of LLMs. 2. Claude-User: The Real-Time Assistant Claude-User operates very differently from a traditional crawler. Instead of gathering data for a massive database, this agent is triggered by a specific action from a human user. When a user asks Claude a question that requires current information—such as “What are the latest reviews for the newest smartphone?” or “Summarize the latest post from this specific blog”—Claude-User fetches the content on the fly. Blocking Claude-User has immediate consequences for how Claude interacts with your brand. If this bot is blocked, Claude will be unable to access your pages in response to user requests. While this protects your server from being accessed by the AI, it also means your content cannot be summarized, analyzed, or cited in real-time conversations. For many news sites and informational blogs, blocking Claude-User can lead to a significant drop in “AI-driven visibility,” as the bot acts as the eyes of the user within the chat interface. 3. Claude-SearchBot: The Indexer for Claude Search The newest addition to the lineup is Claude-SearchBot. As Anthropic continues to evolve its search capabilities—positioning Claude as a direct competitor to AI-powered search engines like Perplexity or Google’s AI Overviews—it requires a dedicated crawler to maintain a high-quality index. Claude-SearchBot crawls content specifically to improve the relevance and accuracy of Claude’s search results. The trade-off here is purely SEO-driven. By allowing Claude-SearchBot, you ensure that your content is indexed and prioritized when users perform searches within the Claude environment. Conversely, if you block this agent, your content may not appear in search-related responses, or if it does, the information may be outdated or less accurate because the bot was unable to verify the latest version of your page. For sites that rely on organic traffic, this bot is often viewed as “friendly,” much like Googlebot. The Technical Guide to Blocking Anthropic Bots Anthropic has confirmed that all of its bots respect standard robots.txt directives. This is the most effective and universally recognized method for controlling their access. To manage these bots, you must edit the robots.txt file located in your site’s root directory (e.g., yoursite.com/robots.txt). How to Block All Anthropic Crawling If you want to completely opt-out of the Claude ecosystem, you must address each bot individually. A single “Disallow” command for one will not stop the others. To block all three, your robots.txt should include the following: User-agent: ClaudeBotDisallow: / User-agent: Claude-UserDisallow: / User-agent: Claude-SearchBotDisallow: / Partial Blocking and Granular Control Many site owners prefer a hybrid approach. For example, you might want Claude to be able to search and cite your content (Claude-SearchBot and Claude-User) but refuse to let them use your data for model training (ClaudeBot). In that case, you would only include the directive for ClaudeBot. Furthermore, you can restrict access to specific directories. If you have a “premium” or “archive” section of your site that you want to keep away from AI training,