7 hard truths about measuring AI visibility and GEO performance

The rise of generative AI has fundamentally shifted the digital landscape, leading to a scramble among brands to establish a presence within Large Language Model (LLM) outputs. This emerging discipline, known as Generative Engine Optimization (GEO), is complex, and the tools designed to measure it are still immature. In this highly commoditized and often exaggerated market, professional integrity demands a clear-eyed look at what is truly measurable versus what is simply marketing hype.

For those deeply invested in the search industry—whether as providers of GEO services or as developers of AI visibility tools—misconceptions often lead to inflated claims. It is essential to peel back the layers and confront the uncomfortable realities of how AI performance is assessed. Over the past few months, numerous claims have been circulated as established facts that lack grounding in rigorous data. It is time to clear the air and discuss the seven hard truths about measuring AI visibility and GEO performance.

1. AI search didn’t kill Google search

Despite the pervasive narrative pushed by clickbait headlines, venture capitalists eager to promote their portfolio companies, and pitch decks from AI visibility tools, the reality is that AI search has not diminished the traditional search engine market. In fact, current data suggests the opposite: the overall search pie is expanding.

To cut through the noise, we must rely on hard data rather than anecdotes or hype cycles. Semrush, in a recent study analyzing over 260 billion clickstreams, found conclusive evidence that the widespread adoption of platforms like ChatGPT has not led to a reduction in Google searches; surprisingly, it has correlated with an increase. This finding holds particular weight, given that Semrush offers its own AI search tracking capabilities, meaning the data isn’t biased toward supporting Google’s longevity.

Further reinforcing this position is the State of Search Q2 2025 report published by Datos, in collaboration with industry veteran Rand Fishkin, CEO of SparkToro. This comprehensive analysis shows that Google continues to maintain a dominant market share, holding firm at around 95% across traditional search engines. The data, collected across millions of U.S. devices, confirms that the vast majority of users remain reliant on the conventional search paradigm.

Understanding Complementary Search Behaviors

The question remains: How can ChatGPT’s user base double, reportedly surpassing 800 million users, while Google’s search volume remains stable or grows slightly? The answer lies in user intent. People are not necessarily replacing Google with ChatGPT; they are using LLMs for different tasks.

A September report published by OpenAI illuminates this distinction, detailing how users actually utilize ChatGPT. The critical finding is that only 21.3% of conversations were focused on seeking information. Within that informational slice, a minuscule 2.1% focused on purchasable products, while the bulk (18.3%) was dedicated to seeking specific facts or details.

For brands trying to reach potential buyers, that 2.1% is the only truly relevant segment. Even then, many of those interactions are navigationally driven, meaning the user already knows the brand they want and is seeking confirmation or contact information, rather than initiating a true discovery moment.

The Search Journey Remains Vital

Moreover, the user journey often loops back to traditional search. If a user asks ChatGPT, “What are the best CRM platforms for small businesses?” and the LLM names three brands, the user’s subsequent logical step is usually to conduct a Google search for one of those specific brands to visit the official website, explore features, and evaluate pricing. For commercially driven queries, the website remains the crucial final destination.

While the emergence of LLM-integrated browsers might shift this dynamic in the future, the current reality is that AI has expanded the market for information-seeking, positioning itself as a complementary research and drafting tool, not a replacement for the reliable, deterministic index of the web that Google provides.

2. No AI visibility tool can actually get you into AI answers

A significant portion of the hype surrounding AI visibility tools echoes the earliest days of the SEO industry. Back then, foundational SEO monitoring tools often promised to “get you to the top of Google,” an impossible feat for software alone. Today, this promise has been recycled: “Our tool will ensure your brand is mentioned by the LLM.”

The core principle remains unchanged: Optimization is an executive function, not an automated one. Just as no tool can execute a comprehensive SEO strategy without human oversight, no tool can fully execute Generative Engine Optimization (GEO).

The Limits of Automation in GEO

A tool can deliver data, surface insights, and offer recommendations, but the actions that fundamentally move the needle—the strategic decisions and high-quality content execution that result in an AI model mentioning a brand—require human judgment.

Consider the necessary actions for effective GEO:

External Credibility Building: Is the software capable of planting organic, authoritative brand mentions on external, high-ranking sites? This is fundamentally impossible without unethical practices like hacking or spamming. Earning credibility requires human-to-human interaction and content contribution.
Content Alignment: While a tool can suggest text edits, are brands truly prepared to grant writing permissions to a SaaS platform for their Content Management System (CMS)? Furthermore, blindly implementing LLM-friendly changes without a holistic SEO review can be disastrous. Content that is easily parsable by an LLM is not automatically guaranteed to be SEO-friendly, and potential conflicts require expert reconciliation.

When AI visibility software publishes case studies titled, “How we increased brand mentions in LLMs by X%,” this framing is a deliberate marketing tactic claiming ownership over the final business outcome. The software may have provided the initial intelligence, but the actual, painstaking work—the strategy, the content creation, the authoritative outreach—was executed by the human GEO team or an external agency. The success stems from human execution informed by tool data, not from the tool itself.

3. No one really knows the real search volume of prompts

In traditional SEO, keyword research rests on the assumption that search volume data is a quantifiable metric, even if it is an estimate. However, in the realm of LLMs, this foundational data point is a black box. OpenAI, Google (for internal Gemini/AI queries), and other LLM companies do not currently share live, public usage data comparable to what is available through Google Analytics or Search Console.

This critical lack of transparency means that no third-party tool or service provider can access the true, absolute search volume of prompts unless they have been granted private access to an unrepresentative sample of user logs.

The Extrapolation Problem

Instead, most platforms that claim to provide “prompt volumes” must rely on highly complex and imperfect estimation methods. These methods typically involve:

Aggregating data from third-party clickstream panels.
Utilizing browser extensions that track aggregated user activity.
Applying sophisticated extrapolation models to forecast relative popularity based on known samples.

While this approach can offer a rough directional view of prompt interest and relative trends—showing that Prompt A is likely twice as popular as Prompt B—it is crucial to recognize this as a forecast, not a fact. Unlike traditional search data, which is based on deterministic queries, AI prompt data is inferred from behavioral data points that may not fully capture the complexity of conversational AI use.

Therefore, any chart displaying specific “prompt volumes” for conversational platforms like ChatGPT or Gemini should be treated with healthy skepticism. It represents an educated guess derived from third-party data, not an absolute measurement of user demand within the LLM ecosystem.

4. AI visibility can’t be measured like search rankings

This is arguably the most fundamental technical hurdle in the GEO space. Traditional search engine results are largely deterministic: given a query and a location, the ranking algorithms produce a relatively fixed list of indexed pages. While there is minor personalization, the core results remain consistent across users.

LLMs, however, operate based on probability, making their output non-deterministic. They generate answers on the fly based on the statistical likelihood of what should sound correct and relevant to the user’s immediate context. This results in significant variance.

The Dual Challenges of Probability and Context

LLMs are designed to generate the most statistically probable answer, often exhibiting a bias toward guessing rather than admitting limitations (the primary cause of artificial hallucination). If one user asks, “Which car insurance company do you recommend?” there is no single, objectively correct answer. The model generates a response statistically likely to satisfy that specific user, factoring in everything it knows about them, including previous chat history, location, and inferred intent.

Consequently, two different users asking the exact same question may receive entirely different sets of recommendations or cited sources. If every response is dynamically generated and heavily shaped by individual context, how can monitoring tools claim to provide an accurate, universal measure of brand appearance?

The monitoring solutions currently available fall into two primary categories, each with inherent limitations:

The Traditional SEO Monitoring Model: Context-Blind Averaging

This approach attempts to apply traditional SEO principles to a non-traditional system. It collects massive volumes of random data points, often aggregated from clickstream panels, third-party browser extensions, and broad usage datasets. This data is then averaged to create a generalized view of brand visibility—the “wisdom of the crowd.”

While this provides a broad snapshot, it suffers from being context-blind. It assumes uniformity among users, effectively smoothing out the nuances and personalization that define LLM interactions. The average visibility score it produces might not accurately reflect what any specific, commercially valuable audience persona is actually seeing.

The AI Visibility Monitoring Model: Persona Sampling

A more sophisticated approach recognizes the non-deterministic nature of AI. This method, often employed by specialized GEO platforms, focuses on defining a specific user persona and then running repeated inferences (queries) for that exact profile. By analyzing the frequency of results across these controlled, repeated tests, the tool attempts to identify the “stable mode”—the most consistent, high-probability answer for that targeted user segment.

This approach offers a more precise reflection of what a specific target user is likely to see, mitigating the distortion caused by averaging across a random, disparate population. However, it still relies on a projected profile and repeated sampling, underscoring the reality that no existing method offers a complete, 1:1 reflection of every user’s true interaction with an LLM.

5. What’s outside your site matters more for GEO than what’s inside it

Most Generative Engine Optimization (GEO) efforts, and the tools supporting them, concentrate on on-page technical optimization and content adjustments. While these factors are important for traditional search ranking, they often have the least direct impact on the most valuable GEO KPI: the organic mention of a brand name in an AI summary.

The logic mirrors traditional SEO: external authority signals are paramount. Just as backlinks signal credibility to Google, independent external brand mentions signal relevance and authority to Large Language Models.

External Credibility and LLM Training

An LLM is trained to assess authority not by what a brand says about itself, but by what others say about it. If a brand touts its own expertise, that’s self-promotion. If dozens of respected, high-authority sources independently validate that brand’s expertise, that constitutes a signal of relevance the model can trust.

Empirical evidence supports this focus on the external layer. An analysis by Ahrefs found a strong correlation between brand web mentions and visibility within Google’s AI Overviews, citing a correlation coefficient of 0.664. This high correlation suggests that LLMs heavily favor off-site context—signals of external validation—over content residing on the brand’s owned website.

Furthermore, not all external mentions are weighted equally. Specific domains are known to carry immense authority in model training and inference. A recent Semrush study identified Reddit and LinkedIn as ranking among the top five most-cited domains across major AI platforms, including ChatGPT, Perplexity, and Google’s AI Overviews. These platforms are valued because they host genuine, unstructured user conversations, providing models with real-world context and social proof.

The Optimization Paradox

The key takeaway for GEO practitioners is simple: ignoring the off-site reputation layer severely limits visibility success. The real challenge is not identifying the sources—most tools can pinpoint authoritative domains—but earning high-quality, natural mentions on those sources, which is a human effort, as previously discussed.

So, why do so many AI visibility tools disproportionately emphasize on-page optimization? Because it is the path of least resistance. It is measurable, scalable, and entirely within the user’s control. Tweaking one’s own website is infinitely easier than influencing a complex ecosystem of third-party publishers. While these on-page optimizations can improve the likelihood of a page being *cited* by an LLM, they often fall short of generating the core business value—a direct, unprompted brand recommendation within the answer summary.

6. The most important KPI in GEO is your brand being mentioned within LLM answers

The ultimate measure of success in Generative Engine Optimization is not simply appearing as a citation link at the bottom of an AI summary. The true value lies in the hard-won placement of the brand name directly within the generative text itself.

While being a cited source looks impressive on a GEO dashboard, its actual marketing and business value, particularly in terms of driving traffic, is consistently minimal.

The Citation Traffic Gap

Data consistently shows a massive disparity between the attention LLMs give to external web sources and the subsequent traffic they send. Matthew Prince, CEO of Cloudflare, shared stark figures illustrating this disconnect. Ten years ago, Google crawled approximately two pages for every visitor sent to a publisher. More recently, those ratios have ballooned:

Google: 18 crawls for every 1 click.
OpenAI (GPTBot): 1,500 crawls for every 1 click.
Anthropic: 60,000 crawls for every 1 click.

For ChatGPT, the ratio means that GPTBot must crawl 1,500 pages before a single visitor is generated through a citation click. The ROI on content creation solely for the purpose of being crawled and cited by LLMs is extremely low.

Similarly, analysis of Google’s AI Overviews confirms this low click-through rate (CTR). Research examining performance data across over 20,000 queries demonstrated that even top-ranked citations within AI Overviews behave more like a traditional Position 6 organic result in terms of traffic generation. CTRs plummet for citations ranked lower than the second spot, indicating that visibility within these summaries does not equate to traffic.

This reality was echoed by Reddit CEO Steve Huffman in a TechCrunch interview, confirming that AI chatbots are not a meaningful traffic driver for Reddit, despite the platform being one of the most frequently cited sources across all major LLMs. Search and direct traffic remain dominant.

Therefore, while citations provide visibility, they do not constitute a reliable traffic channel. The primary goal of GEO must shift toward achieving the most impactful form of visibility: a direct, contextual mention of the brand name integrated seamlessly into the conversational answer—the equivalent of a strong brand recommendation by the LLM itself.

7. GEO practices without proper SEO alignment can backfire

The final hard truth—and perhaps the most financially crucial—is that optimizing exclusively for generative engines without regard for established SEO principles can actively destroy existing organic traffic and revenue streams.

Generative Engine Optimization and traditional Search Engine Optimization, while rooted in the same ecosystem, often have conflicting requirements. LLMs favor content structures, tones, and data organization that facilitate easy extraction and summary generation. These structures may contradict the ranking signals Google rewards for quality, authority, and user experience.

The Real Cost of Misaligned Optimization

Consider a company selling accounting software that has a blog article, “How to choose the best accounting software for a small business,” ranking high (Position 1) on Google, driving a stable 2,000 organic visits per month.

If the company adopts an AI visibility tool that recommends restructuring content to favor LLM extraction—perhaps shortening paragraphs, breaking up complex lists, or altering the internal linking structure to make the main content easier for the model to digest—it might successfully increase its appearance in AI chatbot answers, gaining 200 AI-driven visits per month.

However, if those changes inadvertently weaken the page’s traditional SEO signals (e.g., diluting E-E-A-T signals or negatively impacting Core Web Vitals), the article might plummet in Google’s rankings, dropping to Position 9. The resulting traffic breakdown could look like this:

	Before the optimization	After the optimization
Traffic from Google	2,000	200
Traffic from AI models	0	200
Net Monthly Traffic Change	+2,000	+400

In this scenario, the brand successfully “boosted AI visibility” and earned praise from the GEO tool, but the business suffered a massive net loss of 1,600 highly qualified organic visitors per month.

This is not a hypothetical risk. It is a documented consequence of systems that prioritize LLM metrics in isolation. Most AI visibility tools are designed to measure success only within generative engines. They fail to account for the overall search traffic mix, the health of the traditional search channel, or the correlation between visibility and actual conversions, brand lift, or revenue.

Without properly integrating GEO measurement into the broader SEO and marketing strategy, high-performing AI visibility charts may mask a severe and quiet decline in the marketing KPIs that genuinely matter to the bottom line.

When Search Evolves, Measurement Must Evolve With It

Generative Engine Optimization is an established discipline, fundamentally rooted in the search ecosystem we have known for decades. LLMs still rely on the open web, indexed data, and many of the same signals that have shaped SEO. However, the mechanism of visibility has changed from being listed deterministically to being generated probabilistically, heavily dependent on user context and intent.

This distinction necessitates a fresh approach to measurement. Research has already demonstrated that strong Google rankings do not guarantee AI visibility, with only about a 62% overlap between the two channels. This significant gap confirms that GEO requires its own specialized measurement framework, distinct from a recycled SEO rank tracker.

The mistake is not in building on the past, but in assuming that established metrics can fully capture the complex, probabilistic nature of generative search. Progress in this rapidly moving field requires consistently questioning assumptions, rigorously testing the realities of LLM behavior, and adapting how business success is quantified. Brands looking to hire GEO agencies or invest in AI visibility tools must insist on transparency regarding data sources, methodology (deterministic vs. probabilistic sampling), and holistic measurement that accounts for the potential impact on existing SEO revenue.

Embracing GEO means embracing uncertainty, but only through honest measurement can marketers make strategic decisions that save time, money, and unnecessary frustration.