Why LLM-only pages aren’t the answer to AI search

The Siren Song of Machine-Only Content: Why LLM-First Pages Miss the Mark

As the digital landscape rapidly evolves under the influence of Generative AI (GAI) and Large Language Models (LLMs), content teams and SEO professionals worldwide are grappling with a singular challenge: how do we optimize our digital assets for machines designed to read, synthesize, and cite information autonomously?

The pace of change, particularly with major search updates stacking up in 2026, has led many content strategists down a path that, on the surface, seems highly logical: if search engines and AI chatbots like ChatGPT, Perplexity, and Google’s AI Overviews (AIO) rely on LLMs, why not build content specifically tailored for them?

This line of thinking has sparked a significant, though increasingly scrutinized, trend: the creation of ‘LLM-only’ pages. These are digital assets that humans are never meant to see—think stripped-down markdown files, raw JSON feeds, and entire shadow versions of content libraries living under dedicated directories like /ai/ or /llm/.

The core logic behind this strategy is straightforward: eliminate the noise. Strip out advertisements, navigation menus, complex styling, and interactive elements. Serve the bots pure, clean, easily parsable text, thereby ensuring maximum clarity and improving the likelihood of citation in AI-generated search results. But is this emerging tactic a smart optimization strategy, or merely the latest SEO myth destined for the historical bin alongside obsolete meta tags?

The Rise of Bot-First Content Formats

The trend of designing content solely for machine consumption is undeniably real. Sites spanning high-tech, Software as a Service (SaaS), and extensive documentation libraries have begun implementing LLM-specific content formats. Industry experts, including Malte Landwehr, CPO and CMO at Peec AI, have documented numerous sites creating .md copies of every article or adding dedicated LLM guidance files.

However, the crucial question remains: is adoption correlating with performance? To understand why this strategy has gained traction, we must first examine the specific implementations content teams are deploying.

The Four Flavors of LLM-Specific Optimization

1. llms.txt Files: The AI’s Robots.txt?

One of the most widely discussed—and contested—implementations is the llms.txt file. Positioned at the domain root (e.g., yourdomain.com/llms.txt), this file is a plain text or markdown document designed to help AI systems discover and prioritize important content.

The format was initially introduced in 2024 by AI researcher Simon Willison. It typically includes an H1 project name, a brief description, and organized sections linking to key documentation or critical pages. It acts as a curated sitemap specifically for AI ingestion, intending to guide crawlers toward the most authoritative or helpful resources, potentially boosting citation frequency.

A prime example of this approach is seen in developer documentation. Stripe’s implementation at docs.stripe.com/llms.txt demonstrates a clear, structural organization:

markdown# Stripe Documentation

> Build payment integrations with Stripe APIs

## Testing

- [Test mode](https://docs.stripe.com/testing): Simulate payments

## API Reference

- [API docs](https://docs.stripe.com/api): Complete API reference

The bet is that by providing this clean map, developers asking LLMs “how to implement Stripe” will receive answers sourced directly and cleanly from the documentation. Major adopters of this format include Cloudflare, Anthropic, Zapier, Perplexity, Coinbase, Supabase, and Vercel.

2. Markdown (.md) Page Copies

The pursuit of textual purity has led some organizations to create stripped-down markdown versions of their standard HTML pages. By appending .md to a URL, such as transforming docs.stripe.com/testing into docs.stripe.com/testing.md, teams serve up content devoid of styling, CSS, JavaScript, interactive elements, navigation, and footers.

The underlying theory is that large, resource-intensive HTML pages are difficult for LLMs to parse efficiently. By offering a raw text alternative, the thinking goes, AI systems are more likely to successfully ingest and cite the information without having to render or interpret complex code.

3. /ai and Similar Shadow Paths

A more extreme version of this segregation involves creating entirely separate content libraries under directories like /ai/, /llm/, or /bot/. A site might host a regular /about page for human visitors and a parallel /ai/about page built specifically for machine parsing.

These shadow pages sometimes contain simplified text, sometimes they consolidate data that is too spread out on the main site, or occasionally they hold even more technical detail than the originals. If a human user happens upon one of these directories, the experience is often jarring—resembling a text-heavy, unstyled website from the early 2000s. The explicit goal is machine consumption, not human engagement.

4. JSON Metadata Files for Structured Data

For large organizations dealing with catalog data or complex specifications, the approach often centers on structured data feeds. Dell Technologies, for instance, implemented this by building structured data feeds that live alongside their main e-commerce site, often referenced in their llms.txt.

These files contain clean JSON housing product specifications, current pricing, and availability. This format provides everything an AI needs to answer precise, data-driven queries—such as, “What is the best Dell laptop under $1,000?”—without the AI having to scrape marketing copy or complex user interfaces. This technique makes strong conceptual sense for companies that already manage extensive product data in internal databases, as it merely exposes that data in a machine-friendly format.

The Official Verdict: Google’s Disdain for Bot-Only Content

Despite the widespread implementation of these strategies by content teams seeking an edge, the official consensus from leading search and AI authorities is overwhelmingly negative.

Google’s John Mueller, a senior Search Advocate, has been the most vocal critic of the LLM-only content trend. In a recent discussion on Bluesky, Mueller delivered a blunt comparison that should serve as a wake-up call to publishers engaging in this practice.

“LLMs have trained on – read and parsed – normal web pages since the beginning,” Mueller stated. “Why would they want to see a page that no user sees?”

His comparison was powerful: LLM-only pages are akin to the old, obsolete keywords meta tag. While available for anyone to implement, they are systematically ignored by the sophisticated systems they are intended to influence.

Mueller’s assertion is rooted in the core principle of modern search: authority and relevance are intrinsically tied to user experience and perceived utility. If a page is designed to bypass human users entirely, it signals to the search ecosystem—whether the traditional ranking algorithm or a generative AI model—that the content lacks verifiable human-level context or validation.

No Support, No Plans

Google has officially and repeatedly reinforced this stance. Gary Illyes, another key figure in Google Search Central, explicitly stated at the July 2025 Search Central Deep Dive in Bangkok that Google “doesn’t support LLMs.txt and isn’t planning to.”

Furthermore, Google Search Central’s official documentation remains clear: “The best practices for SEO remain relevant for AI features in Google Search. There are no additional requirements to appear in AI Overviews or AI Mode, nor other special optimizations necessary.”

The message is unambiguous: the investment in creating and maintaining entirely separate content libraries designed exclusively for bots is a wasted effort, fundamentally ignored by the target systems.

Data Doesn’t Lie: Analyzing Real-World AI Citation Rates

The skepticism from search experts is strongly supported by real-world data and analysis. Two major studies—one involving targeted citation testing and another examining adoption at scale—confirm the inefficiency of LLM-only pages.

The Individual Analysis: Malte Landwehr’s Targeted Tests

Malte Landwehr ran targeted tests on five websites actively using these LLM-optimization tactics. His methodology was rigorous: he crafted prompts specifically engineered to surface their LLM-friendly content, with some queries even containing explicit 20+ word quotes designed to trigger citations from specific sources.

Across nearly 18,000 total citations, the results were stark, demonstrating a near-total failure of the bot-first formats to drive traffic or visibility when compared to standard web pages.

llms.txt Citation Results

Out of 18,000 citations, only six pointed to llms.txt files—a marginal 0.03% citation rate. The handful of instances where the file was cited shared a critical characteristic: they contained genuinely useful, unique information about API usage or additional technical documentation that was difficult to find elsewhere.

Crucially, the “search-optimized” llms.txt files—those stuffed with content and keywords in a poor attempt to manipulate the system—received zero citations. This suggests that LLMs are not treating the file as a special ranking signal, but merely as another source of data. If that data is redundant, it is ignored.

Markdown (.md) Page Copy Failures

Sites using .md copies of their content were cited over 3,500 times in the experiment. However, none of those citations pointed to the specialized markdown versions. LLMs consistently preferred the standard, human-readable HTML pages. The only exception was GitHub, where .md files are the standard, user-facing URLs, meaning they are not “LLM-only” but regular, linked web pages.

/ai and Shadow Path Variability

The results for dedicated /ai/ and similar paths showed the greatest variation, ranging from 0.5% to 16% citation rates on tested sites. The site achieving the 16% citation rate had implemented a significant difference: they had placed substantially more unique and detailed information in the /ai/ pages than existed on the corresponding human-facing pages. Even with prompts explicitly crafted to target this data, most queries still ignored the shadow versions.

This finding is the most damning indictment of the strategy. Even when prompts are designed to trigger content, LLMs prefer the user-facing version unless the machine-only version provides novel, unique value.

JSON Metadata Success (with a Caveat)

The only truly compelling result came from JSON metadata. One large brand saw 5% of its citations (85 out of 1,800) generated from their structured data JSON file. The key detail, however, mirrored the other findings: the file contained product information that did not exist in an easily parsable format anywhere else on the website. This success reinforces the principle that value lies in unique data, not the format itself.

The Large-Scale Analysis: SE Ranking’s Domain Study

SE Ranking conducted a different, but equally revealing, analysis, studying 300,000 domains to determine if the adoption of llms.txt correlated with citation frequency at scale.

The study found that only 10.13% of domains had implemented llms.txt, far from the universal adoption seen in foundational SEO standards like robots.txt. Furthermore, larger, more established sites (100,001+ monthly visits) were actually slightly less likely (8.27% adoption) to use the file than mid-tier sites.

To test impact, SE Ranking built a machine learning model using XGBoost to predict citation frequency based on various factors, including the presence of llms.txt. The result was a crucial finding: removing the llms.txt variable from the model actually *improved* the model’s accuracy. The file was not helping predict citation behavior; it was simply adding statistical noise.

The data from both micro and macro perspectives leads to one unavoidable conclusion: there is no special ranking or citation boost granted by using LLM-only formats.

The Pattern: Utility Over Format

Both comprehensive analyses point toward the same clear conclusion: LLM-optimized pages are cited only when they contain unique, necessary, and useful information that is unavailable, or difficult to extract, from the corresponding standard web page.

The format—whether .md, JSON, or a custom directory—doesn’t matter. As Landwehr concluded, “You could create a 12345.txt file and it would be cited if it contains useful and unique information.” This principle confirms that LLMs value content quality, data completeness, and utility above any specific file extension or architectural trick.

The files themselves receive no special treatment from AI systems because, fundamentally, LLMs are trained on the vast corpus of the standard, human-facing web. They are designed to extract meaning from complex HTML, interpret layout, and understand contextual cues like navigation and heading structure. Creating an intentionally sterile version removes the environmental context that LLMs were taught to process.

Mueller’s core question—”Why would they want to see a page that no user sees?”—is the ultimate technical and philosophical barrier to the success of this strategy. If content isn’t validated by human interaction or linked meaningfully within the traditional web structure, it lacks the authority signals that AI models are designed to interpret.

A Smarter Path Forward: True LLM Optimization Strategies

The evidence is overwhelming: SEO and content teams must pivot away from building content that only machines will consume. Instead, optimization energy should be focused on improving the core digital experience for both users and advanced crawlers.

1. Focus on Technical SEO Fundamentals

The real technical barrier to AI parsing is not the lack of an .md file; it’s poor technical execution on the main site. John Mueller specifically pointed out JavaScript dependencies as a major obstacle: “Excluding JS, which still seems hard for many of these systems.” Heavy client-side rendering burdens Googlebot and other AI crawlers, making it genuinely difficult to parse critical content.

True LLM optimization means ensuring that critical information is rendered quickly using clean, semantic HTML and minimal JavaScript dependencies. When bots don’t have to struggle to render the page, they can easily ingest and interpret the content structure, regardless of LLM optimization files.

2. Enhance Structured Data and Schema Markup

While proprietary LLM-only formats are ignored, officially supported structured data remains vital. Platforms like Google have published specifications for using Schema.org markup (such as Product Schema, FAQ Schema, or HowTo Schema) to explicitly tag data elements. This is the search ecosystem’s established method for communicating clean, structured information to all crawlers.

Furthermore, third-party platforms are now publishing specifications for structured data feeds. For example, OpenAI has provided guidelines for ecommerce product feeds for use in its shopping integrations. Using these official schemas provides the clean, structured data LLMs crave without requiring the creation of redundant content.

3. Prioritize Information Architecture and Content Quality

LLMs are fundamentally powerful language processors that respond best to well-organized, authoritative content. The focus should shift to improving the site’s information architecture so that key documentation, product attributes, and authoritative articles are discoverable, linked internally, and organized logically.

If unique data exists—such as specialized API parameters or complex regulatory details—ensure this information is prioritized, organized with clear headings (H2s and H3s), and presented in a way that minimizes ambiguity. Whether this data lives in an API reference guide or a detailed product page, clarity and uniqueness drive citation.

In conclusion, the best page for AI citation remains the same page that works optimally for the user: one that is well-structured, clearly written, technically sound, and fully accessible. Until major AI platforms publish formal requirements stating otherwise, the creation of redundant, machine-only pages is a costly distraction that diverts resources from the optimization efforts that truly move the needle.

The future of SEO and AI visibility hinges on utility and authenticity—not on creating digital echo chambers that exclude the human audience.