ChatGPT citations favor a small group of domains: Study

The Shift from Search Engines to Answer Engines

For over two decades, search engine optimization has been a game of visibility on a linear results page. We optimized for keywords, tracked our rankings on Google, and fought for a spot in the coveted “top three.” However, the rise of Large Language Models (LLMs) like ChatGPT has introduced a new paradigm: the “Answer Engine.” In this new landscape, the goal isn’t just to rank; it’s to be cited as a trusted source within an AI-generated response.

A groundbreaking study conducted by SEO expert Kevin Indig, utilizing data from Gauge, has revealed a startling reality about how ChatGPT selects its sources. The data suggests that AI citations are not a democratic distribution of the web’s knowledge. Instead, they are highly concentrated, favoring a very small group of authoritative domains. For digital marketers, publishers, and SEO professionals, this study serves as a blueprint for the next era of organic visibility.

The Law of Concentration: 30 Domains Rule the Conversation

One of the most significant findings of Indig’s research is the extreme concentration of citation visibility. According to the data, roughly 30 domains capture a staggering 67% of all citations within a given topic. This means that for the vast majority of queries, ChatGPT relies on a “inner circle” of sources to provide information to users.

This concentration is even more pronounced in specific sectors. In product comparison topics, the top 10 domains alone accounted for 46% of all citations. By the time you reach the top 30 domains, they command 67% of the citation share. This creates a “winner-takes-most” environment that is even more restrictive than traditional search engine results pages (SERPs).

Indig notes that in the world of AI search, you are effectively shut out unless you build enough topical authority to win one of a limited number of citation “seats.” Unlike Google, which might show ten blue links and various features, ChatGPT provides a synthesized answer that only has room for a few carefully selected references. If your brand isn’t perceived as a primary authority, your chances of appearing in the citation footprint are slim.

The Gap Between Retrieval and Citation

To understand how to optimize for ChatGPT, it is essential to distinguish between “retrieval” and “citation.” Just because an AI “reads” your page doesn’t mean it will credit your page. A secondary study by AirOps, referenced in Indig’s findings, highlights a massive gap between these two actions.

The research found that ChatGPT retrieved approximately six times as many pages as it actually cited. Perhaps more concerning for publishers is the fact that 85% of the pages retrieved by the AI were never cited in the final response. This suggests that the AI uses a broad net to gather context but applies a much stricter filter when deciding which sources are worthy of being presented to the user.

For SEOs, this means that merely being “crawlable” or “indexable” by an AI agent is only the first step. The content must possess a level of quality, structure, and authority that survives the AI’s internal vetting process. The AI is looking for the most definitive, well-structured, and comprehensive answer, often discarding hundreds of other pages that contain similar but less “authoritative” information.

Does Ranking #1 on Google Still Matter?

A common question in the SEO community is whether traditional rankings translate to AI citations. The study confirms that there is a strong correlation, but it is not a 1:1 relationship. Ranking #1 in Google remains a powerful signal of quality that ChatGPT respects.

Pages that rank in the top position on Google were cited by ChatGPT 43.2% of the time. This is a significant advantage, as #1 ranked pages are 3.5 times more likely to be cited than pages ranking outside the top 20. However, the flip side of this statistic is that nearly 57% of the time, the top-ranked page on Google is *not* cited by ChatGPT.

This discrepancy highlights a shift in how value is measured. Google’s algorithms may prioritize certain backlink profiles or historical signals, while ChatGPT’s retrieval-augmented generation (RAG) process looks for content that best fits the specific nuances of a conversational prompt. While a high Google ranking is a prerequisite for high visibility, it is no longer a guarantee of being the primary source for an AI’s answer.

The Death of “One Keyword, One Page”

For years, the standard SEO tactic was to create dedicated landing pages for specific, isolated keywords. Indig’s study suggests that this approach is largely ineffective for AI-driven search. ChatGPT rewards domains that demonstrate broad topical coverage and use cluster-based content models.

The AI tends to favor pages that answer a question from multiple angles. This “cluster-based” approach means that a single, comprehensive guide that covers a topic in depth is more likely to be cited across a variety of related prompts than a series of thin pages targeting individual keywords.

This shift is driven by how ChatGPT handles “fan-out queries”—follow-up or related questions generated by the AI to clarify a user’s intent. The study found that one-third of cited pages came from these fan-out queries. Interestingly, 95% of these queries had zero search volume in traditional SEO tools. Because these queries are generated dynamically by the AI, you cannot “research” them in the traditional sense. Instead, you must build content that is topically exhaustive, ensuring that no matter what direction the AI takes the conversation, your domain remains the most relevant source.

The Strategic Importance of Content Length

In the debate over short-form versus long-form content, the data leans heavily toward the latter when it comes to AI citations. Generally, longer pages earned more citations, though the effectiveness varied by industry vertical.

The study identified a significant “lift” in citation probability for pages between 5,000 and 10,000 characters. The results became even more dramatic at the extreme end of the spectrum:

Pages under 500 characters averaged only 2.39 citations.
Pages exceeding 20,000 characters averaged 10.18 citations.

However, this isn’t a simple “more is better” rule. In the Finance sector, shorter and denser pages often outperformed long-winded guides. Financial users (and the AI serving them) appear to value precision and directness over word count. Conversely, in fields like Education, Crypto, and Product Analytics, the “length equals value” trend continued with almost no drop-off. For these topics, the AI seems to prefer “ultimate guides” that leave no stone unturned.

Positioning Matters: The 20% Rule

Where you place your information on a page is just as important as what you write. ChatGPT’s citation behavior shows a clear preference for the upper portions of a webpage. Across all industries studied, the section between the top 10% and 20% of the page performed the best.

The data shows a “steep ramp” in visibility at the beginning of an article:

Finance: 43.7% of all citations occurred within the first 30% of the page.
Healthcare and HR Tech: These showed a flatter distribution, suggesting the AI scans deeper for nuanced medical or technical details.
Education: Citations tended to peak slightly later, around the 30% to 40% mark.
Conclusions: Across the board, the bottom 10% of a page was largely ignored, earning only 2.4% to 4.4% of citations.

The takeaway for content creators is clear: Front-load your value. The “inverted pyramid” style of journalism is more relevant than ever. If you save your most insightful points for the conclusion, there is a high probability that the AI will never cite them. You must provide the “best answer” immediately, using the rest of the page to provide the supporting evidence and depth that justifies the citation.

Vertical-Specific Insights: How AI Treats Different Industries

The study analyzed approximately 98,000 citation rows from 1.2 million ChatGPT responses across seven distinct verticals. This large-scale analysis revealed that AI does not treat all subjects equally.

Finance and Crypto

In Finance, the AI prioritizes density and immediate accuracy. The window for earning a citation is narrow, focused heavily on the introduction and the first few headers. In Crypto, however, the AI favors much longer, technical explanations, likely because the subject matter requires more context to be accurately synthesized.

Education and Healthcare

In Education, the AI is willing to “read” further down the page, often citing sections that appear midway through an article. Healthcare follows a similar pattern, where the AI looks for detailed, authoritative explanations rather than quick summaries. This suggests that for YMYL (Your Money or Your Life) topics, the AI is programmed to be more thorough in its retrieval process.

Product Analytics and HR Tech

These B2B-heavy sectors show that ChatGPT favors category roundups and comparison pages. Of the URLs cited, 58% were cited only once, but the pages that recurred across multiple prompts were almost always broad guides or “best of” lists. If you want recurring traffic from AI search, being the definitive source for a product category is the most viable path.

Actionable Strategies for the AI Era

Based on the findings of the Indig study, businesses need to evolve their SEO strategies to remain competitive. Here are the core pillars of a modern AI-optimization strategy:

1. Target Topical Authority, Not Keywords

Stop focusing on individual keyword rankings. Instead, aim to “own” a topic. This involves creating pillar pages that link to several related sub-topics. By covering a subject from every possible angle, you increase the likelihood that ChatGPT will find your domain relevant during its “fan-out” query process.

2. Optimize for the “Top 20%”

Ensure that your most important data points, definitions, and answers are located in the first fifth of your content. Use clear, concise language in your opening paragraphs and use H2 headers to signal the core topics immediately. Don’t make the AI (or the user) dig for the answer.

3. Invest in Long-Form, Structured Content

While density is important, the study shows that longer pages generally earn more citations. Aim for comprehensive guides that exceed 5,000 characters. Use structured data, bullet points, and clear formatting to help the AI parse the information efficiently. Remember, the AI is retrieving six times more than it cites—your structure might be the deciding factor in whether you get the link.

4. Build a “Moat” of Authority

Since 67% of citations go to just 30 domains, brand authority is more important than ever. This isn’t just about backlinks; it’s about being a recognized entity in your space. Engage in PR, social proof, and high-quality guest posting to ensure your brand is mentioned across the web in association with your primary topics.

Conclusion: The Future of Discovery

The way users discover information is undergoing a fundamental transformation. While traditional Google search isn’t going away, the “Answer Engine” is capturing a growing share of user intent. As Kevin Indig’s study proves, the barrier to entry in this new world is high. AI citations are concentrated, selective, and biased toward deep topical authority.

To succeed, publishers must move beyond the “one keyword, one page” mentality and embrace a model that prioritizes comprehensive, well-structured, and authoritative content. By understanding the patterns of how ChatGPT retrieves and cites information—from the importance of page length to the critical nature of on-page positioning—marketers can secure their “seat at the table” in the AI-driven future of the web.