What repeated ChatGPT runs reveal about brand visibility

The Shift from Deterministic Search to Probabilistic AI

For decades, search engine optimization (SEO) was built on a foundation of relative stability. While Google’s algorithms were—and still are—notoriously complex, a search query performed by two different users in the same location would generally yield very similar results. This deterministic nature allowed marketers to track rankings with a high degree of precision. If you were in the third position for “best accounting software” on Monday, you were likely there on Tuesday.

The rise of Large Language Models (LLMs) like ChatGPT has completely disrupted this paradigm. We are moving away from the era of the static index and into the era of the probabilistic response. When you ask an AI a question, it doesn’t “look up” an answer; it generates one, token by token, based on mathematical probabilities. This means that if you ask ChatGPT the same question ten times, you are likely to get ten different responses.

This inherent inconsistency raises a critical question for digital publishers and B2B marketers: If the AI is constantly changing its mind, how can we accurately measure brand visibility? New research into repeated ChatGPT runs provides a startling look at just how volatile these recommendations are and what it takes for a brand to achieve true dominance in the age of AI search.

Understanding the Research: Methodology and Scope

To understand the mechanics of AI brand visibility, it is essential to look at data derived from high-volume testing. Recent studies, including foundational work by Rand Fishkin at SparkToro, have highlighted that AIs are highly inconsistent when recommending products. Building upon that premise, a deeper dive into B2B-specific use cases was conducted to see if factors like category competitiveness or prompt complexity could stabilize these erratic responses.

The methodology for this specific research involved a rigorous testing environment:

The Prompt Set: 12 distinct prompts were developed, split between highly competitive B2B categories (like general accounting software) and niche categories (such as User Entity Behavior Analytics, or UEBA).
Complexity Levels: The prompts were further divided into “simple” queries (e.g., “What is the best accounting software?”) and “nuanced” queries that included specific personas and pain points (e.g., “For a Head of Finance focused on ensuring financial reporting accuracy and compliance, what is the best accounting software?”).
The Execution: Each of the 12 prompts was run 100 times through the logged-out, free version of ChatGPT. To ensure the results weren’t skewed by session history or IP tracking, a different IP address was used for each of the 1,200 interactions, simulating 1,200 unique users.

The goal was to move past anecdotal evidence and determine the statistical likelihood of a brand appearing in a generative response. The findings reveal a landscape where visibility is much harder to maintain than many marketers realize.

How Many Brands Does ChatGPT Actually Know?

One of the first revelations from the data is the sheer volume of brands ChatGPT draws from when generating recommendations. Across 100 runs of a single prompt, ChatGPT mentioned an average of 44 different brands. However, this number fluctuates wildly depending on the industry. In some highly fragmented categories, the AI mentioned as many as 95 different brands over the course of 100 sessions.

The Impact of Category Competitiveness

The data shows a clear correlation between the maturity of a software category and the “bench depth” of ChatGPT’s recommendations. For competitive categories, the AI mentioned nearly twice as many brands per 100 responses compared to niche categories. This suggests that in crowded markets, ChatGPT’s probabilistic engine has a much wider net of “likely” candidates to choose from, making it significantly harder for any single brand to stand out consistently.

The Nuance Paradox

Interestingly, adding complexity to a prompt—such as specifying a persona or a use case—did not drastically narrow the field of brands mentioned. One might assume that a more specific request would lead to a more curated list of experts. Instead, ChatGPT mentioned only slightly fewer brands in response to nuanced prompts. For some categories, the number of brands actually increased when the prompt became more complex.

This suggests that ChatGPT may not yet have a deep enough understanding of specific brand features to differentiate them based on sophisticated use cases. It knows a brand exists within a category, but it lacks the granular data to know if “Brand A” is truly better for a “Head of Finance” than “Brand B.” As a result, it falls back on its broader training data, leading to a similar rotation of names regardless of the persona provided.

The Return of the ’10 Blue Links’

For years, the SEO industry joked about the “10 blue links” of the Google search results page. In a fascinating twist of digital evolution, ChatGPT seems to have adopted a similar constraint. On average, ChatGPT mentions approximately 10 brands in any single response. While the range can vary—from a minimum of 6 to a maximum of 15—the average remains remarkably consistent with traditional search formats.

However, the difference lies in the rotation. While Google’s 10 links remain relatively static for a given query, ChatGPT’s 10 links are in a state of constant flux. In competitive categories, the AI draws from its deep bench, swapping brands in and out with every new conversation. This creates a “lottery effect” for brand visibility. Even if your brand is in the top 44 names the AI knows, your chance of appearing in any specific user’s session is only a fraction of the total.

Why Rotation Matters for GEO

This rotation is the primary challenge for Generative Engine Optimization (GEO). In traditional SEO, if you rank #3, you receive #3-level traffic consistently. In the world of AI search, if you are a “visible but not dominant” brand, you might appear in 20% of responses. This means 80% of potential customers never see your name, even though the AI “knows” who you are. This inconsistency makes it incredibly difficult to forecast lead generation or brand lift from AI platforms.

The Winner’s Circle: Defining Dominant Brands

Perhaps the most significant finding of the repeated-run research is the existence of the “Winner’s Circle.” Across 100 responses, only a tiny fraction of brands achieved what could be called “Dominant Status”—defined as appearing in 80% or more of the AI’s responses.

On average, only about 5 brands per category reached this level of consistency. These are the brands that have effectively “broken” the probabilistic nature of the AI; the model’s confidence in these brands is so high that it includes them regardless of the specific generation path it takes. In the accounting software category, for instance, these dominant spots are held by industry titans like QuickBooks, Xero, and Sage.

The Stats of Dominance

Competitive Categories: Only 7% of mentioned brands reach the 80%+ visibility threshold. The vast majority—72%—languish in the “long tail,” appearing in fewer than 20% of responses.
Niche Categories: The landscape is much more favorable here. In niche software categories, 21% of brands reach dominant status. Because the AI has fewer options to choose from, it is more likely to stick with the ones it knows.

This data confirms that the “rich get richer” in AI search. Large, established brands with massive digital footprints and high volumes of historical mentions in training data (articles, reviews, forums) are the only ones that enjoy stable visibility. For everyone else, visibility is a gamble.

Why Nuanced Prompts Make Dominance Harder

When marketers talk about AI search, they often focus on “long-tail” queries, assuming that being the perfect solution for a specific niche will guarantee visibility. However, the data suggests that nuanced prompts actually make it harder to enter the “Winner’s Circle.”

When persona context is added to a prompt, the “cliff” to reach dominance becomes steeper. ChatGPT appears to commit more strongly to a very small handful of top picks when it perceives a specific need, but because its knowledge of specific brand strengths is shallow, it becomes more hesitant to recommend a wide variety of secondary options. If you aren’t already one of the top-tier recognized brands for that niche, adding more “nuance” to the user’s search might actually decrease your chances of being mentioned.

What This Means for B2B Marketing Strategy

The implications of this research for B2B marketers are profound. It requires a total rethinking of how we approach brand awareness and search visibility. If the AI is inconsistent, our strategies must be designed to overcome that inconsistency.

1. The Case for Niching Down

The data is clear: it is significantly easier to become a dominant brand in a niche category than in a broad one. For an upstart tech company or a mid-market player, trying to become “dominant” in a category like “CRM software” is a multi-million dollar, multi-year uphill battle. The training data is already saturated with mentions of Salesforce and HubSpot.

However, if you differentiate your brand architecture to be the “best CRM for renewable energy startups,” you are playing in a niche category. In these spaces, the AI has fewer “slots” to fill and fewer brands to choose from. Your path to 80%+ visibility is much shorter. Differentiation isn’t just a marketing buzzword anymore; it is a technical requirement for AI visibility.

2. The Fallacy of Single-Point Tracking

The most immediate takeaway for SEO professionals is that current AI visibility tracking tools are often fundamentally flawed. Most tools perform a “spot check”—they run a prompt once and report the result. Based on what we know about the 44-brand rotation and the probabilistic nature of LLMs, a single run is statistically irrelevant.

If a tracking tool shows your brand is “ranking” in ChatGPT today, that might just be a lucky roll of the dice. Conversely, if it shows you aren’t there, you might actually be in 40% of responses, but the tool caught one of the 60% where you were rotated out. To get a true “Visibility Score,” marketers must run prompts multiple times (at least 5 to 10) to determine if they are in the dominant tier, the visible middle, or the invisible long tail.

3. Focusing on Authority and Citations

To move from the long tail into the “Winner’s Circle,” a brand must increase the model’s confidence. This is achieved through a high volume of quality mentions across the web. AI models are trained on large-scale crawls of the internet. If your brand is frequently cited in reputable trade publications, discussed on Reddit, reviewed on G2 or Capterra, and mentioned in educational content, the probability of the AI selecting your brand name increases.

Conclusion: Navigating the Future of AI Visibility

The revelation that ChatGPT draws from an average of 44 brands but only consistently recommends five serves as a wake-up call for the industry. Brand visibility in the age of AI is not a binary “yes or no” proposition; it is a spectrum of probability. For B2B marketers, the goal is no longer just to “be indexed,” but to be so deeply embedded in the training data and the digital conversation that the AI cannot ignore you.

As we move forward, the focus of digital publishing and SEO will likely shift toward “Confidence Building.” Whether it’s through niche differentiation or aggressive digital PR to increase citations, the objective remains the same: reducing the AI’s uncertainty. Only by understanding the probabilistic nature of these models can marketers hope to achieve stable, predictable visibility in an increasingly inconsistent search landscape.

The “10 blue links” haven’t disappeared; they’ve just started moving. The brands that learn how to stop the rotation are the ones that will win the next era of search.