Hidden HTTP Page Can Cause Site Name Problems In Google via @sejournal, @MattGSouthern
The Importance of Site Names and Brand Identity in Modern Search In the evolving landscape of Google Search, brand identity has taken center stage. It is no longer enough to simply rank for keywords; a brand must present a professional, recognizable identity within the Search Engine Results Pages (SERPs). One of the most visible ways Google facilitates this is through the display of site names and favicons alongside search snippets. These elements provide immediate visual cues to users, helping them distinguish between established brands and generic results. However, many webmasters and SEO professionals have recently encountered a frustrating issue: despite implementing the correct structured data and meta tags, their site names appear incorrectly or revert to a simple URL format. Google’s John Mueller recently shed light on a subtle technical oversight that could be the culprit. This issue involves a “hidden” or leftover HTTP version of a homepage that remains accessible to Googlebot, even if it is invisible to standard users browsing via Chrome or other modern browsers. The Discovery: John Mueller on Ghost HTTP Pages The revelation came during a recent interaction where a site owner questioned why Google was failing to display the correct site name and favicon despite the site having transitioned to HTTPS years ago. The site owner noted that their site appeared correctly in a browser, yet the SERPs reflected outdated or generic information. John Mueller, Search Advocate at Google, pointed out a critical technical nuance. While modern browsers like Google Chrome often automatically upgrade requests to HTTPS or use cached versions of a site, Google’s indexing systems are much more literal. If an old HTTP version of a homepage still exists and returns a “200 OK” status code—meaning the page is live and accessible—rather than a “301 Moved Permanently” redirect, Googlebot may still crawl and index that version. If this “hidden” HTTP page lacks the updated structured data (WebSite schema) or the correct title tags required for Google’s site name system, it can cause a conflict. Google may prioritize the information found on the HTTP version or become “confused” by the conflicting data between the HTTP and HTTPS versions, leading to a failure in displaying the site name and favicon. How Google Determines Site Names To understand why a leftover HTTP page is so disruptive, it is essential to understand how Google identifies and displays site names. Google uses several sources to determine the most accurate name for a website: 1. WebSite Structured Data The most influential method is the use of `WebSite` structured data on the homepage. By using the `name` and `alternateName` properties within a JSON-LD script, webmasters explicitly tell Google what the site should be called. This is the primary signal Google looks for when generating the site name in the SERPs. 2. The Title Tag Google also looks at the “ tag of the homepage. If the structured data is missing or inconsistent, Google relies on the title tag to infer the brand name. If a site’s HTTP version has an old title tag like “Home” instead of “Brand Name – Home,” it creates a discrepancy. 3. Heading Elements (H1) Like title tags, H1 elements are used as secondary signals. Google’s algorithms analyze the most prominent text on the homepage to verify the identity of the site. 4. Open Graph and Meta Information Data from Open Graph tags (often used for social media sharing) and other meta tags can also serve as supporting evidence for Google’s site name algorithms. When an old HTTP version of a page exists, it often lacks the modern optimizations applied to the HTTPS version. If Googlebot happens to prioritize the HTTP version during its site-level crawl, it may pull the “Site Name” data from a page that hasn’t been updated in years. The Browser Illusion: Why You Might Miss the Problem The reason this issue is described as “hidden” is due to how modern web browsers handle security. Most users, including developers and SEOs, browse the web using HTTPS. Google Chrome, in particular, is aggressive about upgrading connections to HTTPS. If you type a URL into your browser, it might automatically redirect you to the secure version or warn you if you attempt to access an insecure page. Because of this seamless user experience, a webmaster might assume that their HTTP-to-HTTPS redirects are working perfectly. However, there is a difference between a browser-side upgrade and a server-side redirect. If the server is still configured to serve a live page on port 80 (HTTP) without redirecting to port 443 (HTTPS), Googlebot will see a valid page. While your browser hides the flaw, Google’s crawler sees it as a separate, competing version of your homepage. Technical Deep Dive: The Role of 301 Redirects The solution to this problem lies in the implementation of server-side 301 redirects. A 301 redirect is a “permanent” redirect that tells search engines (and browsers) that a resource has moved to a new location. Crucially, a 301 redirect passes “link equity” and consolidation signals to the new URL. If your HTTP homepage is still returning a 200 status code, Google considers it a unique entity. To fix this, you must ensure that every request to an HTTP URL is met with a 301 redirect to the HTTPS equivalent. This consolidation ensures that Googlebot only “sees” one version of the site—the secure one—and applies all site-level metadata accordingly. Common Misconfigurations There are several reasons why an HTTP version might remain active: Partial Redirects: The redirect might be set up for inner pages but missed for the root homepage. Load Balancer Issues: Sometimes, the load balancer handles HTTPS, but the origin server still responds to HTTP requests without redirecting. CDN Caching: A Content Delivery Network might be serving a cached HTTP version of the site even after server-side changes are made. CMS Defaults: Some Content Management Systems might recreate a default index.html file on the HTTP path during updates. How to Identify a Hidden HTTP Page Since you cannot rely on your standard browser