What 107,000 pages reveal about Core Web Vitals and AI search
The Evolving Relationship Between User Experience and Algorithmic Trust As the digital landscape undergoes a dramatic transformation fueled by generative artificial intelligence, the rules governing search visibility are rapidly changing. Google’s integration of AI-led features, such as AI Overviews and AI Mode, has shifted how users discover information, raising critical questions about how search engines and AI systems select the sources they trust and cite. For years, the SEO community has relied heavily on Core Web Vitals (CWV) as the clearest public proxy for measuring user experience (UX). The logic seems irrefutable: faster pages lead to better engagement signals, and AI systems, which prioritize quality and trustworthiness, should naturally favor content originating from websites with superior CWV scores. This underlying assumption—that technical perfection translates directly into a visibility boost—is what many SEO strategies are currently built upon. However, logic must always yield to empirical evidence. To properly test this widely held hypothesis, a massive analytical effort was undertaken, spanning the performance metrics of 107,352 unique webpages that have demonstrated prominence within Google’s AI-driven search results. The goal was not simply to confirm whether CWV “matters,” but to dissect precisely *how* it influences AI visibility and whether it functions as a primary competitive differentiator. The findings offer a nuanced conclusion that challenges prevailing wisdom: Core Web Vitals are crucial, but their role in the age of AI search is not what most technical SEO teams currently assume. They act less as a growth lever and more as a gatekeeper. The Scope of the Investigation: 107,000 AI-Visible Pages To accurately assess the correlation between page experience and AI performance, the analysis focused exclusively on content already demonstrating a high degree of AI visibility. This dataset of 107,352 webpages included documents that were frequently cited, summarized, or included in Google’s AI Overviews and dedicated AI Mode search environments. By focusing on pages that have successfully passed the initial quality filters of AI systems, the research aimed to determine if subtle or significant differences in page speed and stability—measured by Largest Contentful Paint (LCP) and Cumulative Layout Shift (CLS)—could predict variations in AI performance rankings. This approach moves beyond generalized site audits. It treats the problem at the page level, which is critical because AI models do not evaluate a website’s mean performance; they evaluate the quality and experience delivered by the specific document they are considering for retrieval or summarization. Understanding Core Web Vitals in the AI Context Before diving into the correlations, it is essential to recall what the primary CWV metrics represent: Largest Contentful Paint (LCP): Measures perceived loading speed. It marks the point when the largest primary content element (image or block of text) on the page has fully loaded and is visible to the user. Cumulative Layout Shift (CLS): Measures visual stability. It quantifies unexpected shifts in the layout during the page loading phase, which significantly degrades user experience. In the traditional SEO environment, achieving ‘Good’ status across these metrics was associated with ranking boosts (or penalty avoidance). The hypothesis being tested here is whether that association holds true when the search results are mediated by advanced language models. Why Distributions Matter More Than Scores A fundamental challenge in CWV analysis is the tendency to rely on averages and simple pass/fail thresholds. Most SEO reporting tools consolidate thousands of URL metrics into a single, summary mean. However, this approach severely masks the reality of user experience across a large site. The first crucial step in analyzing the 107,000 pages was to visualize the performance metrics as a distribution rather than a mean. This immediately exposed the limitations of averaged reporting. The Skewed Reality of Largest Contentful Paint (LCP) When LCP values for the dataset were plotted, the distribution revealed a pronounced heavy right skew. The majority of pages clustered comfortably within an acceptable performance range—often around or slightly above the recommended ‘Good’ threshold of 2.5 seconds. The median performance was broadly satisfactory. However, the “long tail” of the distribution extended dramatically to the right, showing a small but significant proportion of extreme outliers. These were pages with horrendously slow load times, perhaps exceeding 5 or 10 seconds. While these pages represented a minority of the total population, their extreme poor performance exerted a disproportionate influence, pulling the overall site average (the mean) toward an undesirable score. For an SEO strategist, this distinction is vital. A poor site average may suggest a systemic problem when, in reality, it may be caused by a small number of broken templates or highly complex, unoptimized pages. The vast majority of users visiting the median-performing pages are having an adequate experience. Cumulative Layout Shift (CLS) Reflects Similar Extremes Cumulative Layout Shift exhibited a related pattern. The overwhelming majority of pages recorded CLS scores near zero, indicating high visual stability. This suggests that for most content, major layout shifts are not an issue. Yet, similar to LCP, a small minority of pages displayed severe instability, producing high CLS scores. This minority pulls the mean up, creating the false impression of a site-wide instability issue. Again, the mean failed to reflect the lived experience of the majority of users. This distributional analysis clarifies a crucial point for AI systems: AI does not reason over these aggregated means. It processes individual documents. Before even discussing correlation, it’s clear that Core Web Vitals is not a single, monolithic signal; it is a varied distribution of behaviors across a mixed population of documents. Analyzing the Correlation: Rank vs. Linear Relationships Because the CWV data was unevenly distributed (non-normally distributed), traditional statistical measures like the Pearson correlation coefficient were inappropriate. A standard Pearson correlation assumes a linear relationship and a normal distribution, which would have misrepresented the findings. Instead, the analysis utilized the Spearman rank correlation. This method is used to determine if there is a monotonic relationship between the variables—that is, whether pages that rank higher on CWV performance also tend to rank higher or lower on AI visibility, regardless of whether that relationship is perfectly linear. If