Only 15% of pages retrieved by ChatGPT appear in final answers: Report

Only 15% of pages retrieved by ChatGPT appear in final answers: Report

The landscape of search engine optimization is undergoing a seismic shift. For decades, the goal for digital publishers and SEO professionals was simple: rank on the first page of Google. However, with the rise of AI-driven search tools like ChatGPT, the metrics for success are changing. It is no longer enough to simply be “found” by an algorithm; your content must now survive a rigorous selection process internal to the AI itself.

A comprehensive new study by AirOps has revealed a startling reality for content creators: ChatGPT retrieves far more information than it actually shares with the user. According to the report, a staggering 85% of the webpages that ChatGPT crawls and “reads” during the research phase of a query never make it into the final response. Only 15% of retrieved pages earn a coveted citation.

This finding suggests that we are entering an era where “discovery” is merely the first hurdle. The real challenge lies in “selection”—the process by which an AI decides which specific sources are authoritative, relevant, and concise enough to be presented as a reference. For those in tech and gaming publishing, where accuracy and up-to-the-minute data are paramount, understanding this 15% threshold is critical to maintaining visibility.

The Gap Between Retrieval and Citation

To understand why so much content is being left on the cutting room floor, we must first understand how ChatGPT handles a user prompt. Unlike a traditional search engine that presents a list of links and leaves the filtering to the human user, ChatGPT acts as a synthesis engine. It performs what is known as Retrieval-Augmented Generation (RAG).

In the RAG process, the AI identifies a broad set of potential sources that might contain the answer to a user’s question. This is the retrieval phase. However, once the information is gathered, the AI’s internal logic filters these sources. It looks for the most direct answers, the most reputable data, and the pages that best align with the specific intent of the prompt. The AirOps analysis, which looked at 548,534 pages across 15,000 prompts, proves that this filter is incredibly narrow.

The fact that 85% of pages are discarded means that many websites are successfully optimized for discovery but are failing at the synthesis stage. They are visible to the AI’s “spider,” but they aren’t providing the level of utility required to be cited as a primary source. This shifts the focus of SEO from keyword density and backlink profiles toward deep relevance and information density.

Analysis by Query Type: Where Do Citations Land?

Not all searches are created equal. The AirOps report highlights that the likelihood of being cited fluctuates significantly based on the intent of the user’s query. This suggests that the AI’s “threshold for quality” changes depending on what the user is trying to accomplish.

Product Discovery Queries: 18.3% Citation Rate

Product discovery searches—such as “What are the best mechanical keyboards for gaming in 2025?”—saw the highest citation rate at 18.3%. This is likely because product recommendations require a diverse set of viewpoints and specifications. When ChatGPT provides a list of recommendations, it often pulls from multiple review sites to ensure a balanced perspective, giving more creators a chance to be featured.

How-To and Informational Queries: 16.9% Citation Rate

How-to queries, such as “How to optimize Windows 11 for high FPS,” yielded a 16.9% citation rate. In these instances, the AI prioritizes clarity and step-by-step accuracy. Pages that are structured with clear headings, lists, and direct instructions are more likely to be selected from the retrieved pool.

Validation Searches: 11.3% Citation Rate

The lowest citation rate occurred during “validation” searches, where users are looking for a specific fact or seeking to confirm a piece of information (e.g., “Does the RTX 4090 support DisplayPort 2.1?”). At just 11.3%, this category is the most difficult to break into. For these queries, ChatGPT often finds the answer in a few highly authoritative sources and discards the rest. If five sites say the same thing, the AI will likely only cite the one it deems most “trusted” or the one it crawled first.

The Phenomenon of “Fan-Out” Queries

One of the most enlightening aspects of the AirOps report is the concept of “Fan-out” searches. Most users assume that when they type a prompt into ChatGPT, the AI performs a single search. In reality, ChatGPT frequently expands a single user prompt into multiple internal searches to gather a more comprehensive data set. This creates what researchers call a “second citation surface.”

The data shows that 89.6% of prompts triggered two or more follow-up searches. In the study’s dataset, 15,000 initial prompts were expanded into over 43,233 total queries. This is an incredible opportunity for SEOs who understand how to target long-tail, specific information.

Crucially, 32.9% of all cited pages appeared only in these fan-out results. They were not found during the initial, broad search but were discovered when the AI dug deeper into specific sub-topics. For example, a prompt about “upcoming RPG games” might fan out into a specific search for “Avowed release date rumors.”

Perhaps most importantly, 95% of these fan-out queries had zero traditional search volume on platforms like Google. This means that AI is searching for information that humans aren’t necessarily typing into a search bar. They are looking for the “connective tissue” of a topic. To win in this environment, content creators must cover niche details and secondary questions that surround a main topic, rather than just targeting high-volume keywords.

The Correlation Between Google Rankings and AI Citations

For those wondering if traditional SEO is dead, the AirOps report provides a definitive answer: No. In fact, ranking well on Google is one of the strongest predictors of being cited by ChatGPT. The study found that 55.8% of cited pages were ranked within the top 20 of Google’s search results.

The advantage of being in the top spot is even more pronounced. Pages holding the Number 1 position on Google were cited 3.5 times more often than pages that fell outside the top 20. This suggests that OpenAI’s retrieval system relies heavily on the same signals of authority and quality that Google uses. If your site is trusted by Google’s E-E-A-T (Experience, Expertise, Authoritativeness, and Trustworthiness) standards, it is much more likely to be selected during ChatGPT’s synthesis phase.

However, the fact that nearly 45% of citations come from pages outside the top 20 is also significant. It shows that AI is willing to look past the “giants” of the search results if a smaller, more specialized site provides a better, more direct answer to a specific sub-query. This levels the playing field for niche tech blogs and specialized gaming sites that may not have the domain authority of a major publication but possess superior, detailed content.

Optimizing for Selection: Beyond the Retrieval Phase

If being retrieved is easy but being cited is hard, how should digital publishers adjust their strategy? The shift from SEO (Search Engine Optimization) to GEO (Generative Engine Optimization) requires a change in how content is structured.

First, information density is paramount. ChatGPT and other LLMs (Large Language Models) are looking for “nuggets” of information. Long, rambling introductions and “fluff” content designed to keep a user on a page for dwell time are detrimental in the AI era. The AI wants the answer quickly so it can move on to the next part of the synthesis. Use clear, declarative sentences and front-load your most important data.

Second, structure your content for machine readability. While AI models are getting better at understanding natural language, clear formatting—such as H2 and H3 tags, bulleted lists, and tables—helps the AI parse the relationship between different pieces of data. If the AI can easily “scrape” a table of specs for a new graphics card, it is much more likely to cite that page than a paragraph of text containing the same information.

Third, focus on the “Fan-out” opportunity. Since a third of citations come from these secondary searches, publishers should aim to be the definitive source for “hidden” details. Don’t just write about the big news; write about the technical nuances, the edge cases, and the “how” and “why” behind the headlines. These are the details ChatGPT looks for when it expands a prompt.

The Future of Digital Publishing in an AI World

The AirOps report serves as a wake-up call for the industry. The 15% citation rate indicates that the competition for eyeballs is becoming more intense, not less. As AI tools become the primary interface for information gathering, the “middle class” of content—pages that are okay but not great—will likely see a massive drop in traffic. These are the pages that are being retrieved but never cited.

However, for high-quality publishers, this is an opportunity. By ranking well on traditional search engines and providing high-utility, well-structured content, you can ensure that your site remains part of the 15%. The goal is to become an “indispensable source”—the kind of page that an AI cannot ignore when it is building an answer for a user.

As we move forward, the relationship between search engines and AI will continue to blur. But the core principle of the web remains the same: the most helpful, accurate, and accessible information wins. The only difference now is that your first “reader” isn’t a human—it’s an AI looking for a reason to keep or discard your work. Make sure your content gives it every reason to keep it.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top