SerpApi asks court to throw out Reddit scraping complaint
SerpApi asks court to throw out Reddit scraping complaint The legal landscape surrounding data scraping, intellectual property, and search engine accessibility is currently undergoing a massive transformation. At the heart of this shift is a high-stakes legal battle between Reddit, the self-proclaimed front page of the internet, and SerpApi, a company that provides tools to scrape search engine results pages (SERPs). SerpApi has officially moved to have Reddit’s lawsuit dismissed, a move that could set a major precedent for how data is handled in the age of generative AI and automated data collection. The motion to dismiss follows an amended complaint filed by Reddit in February, which sought to tighten the legal noose around SerpApi and several other defendants. However, SerpApi argues that Reddit’s claims are not only factually thin but represent a dangerous attempt to expand platform power over content that Reddit does not technically own and data that is fundamentally public. The Core of the Dispute: Ownership and the User Agreement One of the primary pillars of SerpApi’s defense centers on the question of who actually owns the content posted on Reddit. In a blog post addressing the legal action, SerpApi CEO Julien Khaleghy pointed out a significant irony in Reddit’s legal strategy. According to Reddit’s own User Agreement, the individuals who post content—the users—retain ownership of their contributions. While Reddit holds a non-exclusive license to host, display, and distribute that content, it does not possess the full copyright ownership required to sue third parties for copyright infringement in the manner they are attempting. SerpApi argues that Reddit is attempting to use copyright law as a blunt instrument to control information it does not own. If the court agrees with SerpApi, it could undermine Reddit’s entire legal standing in the case. Under U.S. copyright law, to bring a successful infringement claim, the plaintiff typically must prove they own the valid copyright to the material in question. By admitting in their terms of service that users retain ownership, Reddit may have created a legal barrier for itself that is difficult to bypass. The Nature of Search Snippets Another critical aspect of the defense involves the nature of the data being “scraped.” Reddit’s complaint highlights the use of snippets—short fragments of text, dates, addresses, and usernames—that appear in search results. SerpApi contends that these fragments are not copyrightable. Under the “de minimis” doctrine and the factual nature of such data, short phrases and metadata generally do not meet the threshold of original creative work required for copyright protection. Furthermore, SerpApi emphasizes that they are not scraping Reddit directly. Instead, they are accessing Google Search pages. This distinction is vital to their legal strategy. When a user searches Google, Google displays snippets of various websites, including Reddit. SerpApi provides a service that allows users to see what Google is showing. Therefore, SerpApi argues they are acting as a middleman for public search data rather than a pirate of Reddit’s private database. The DMCA Controversy: What Constitutes Circumvention? Reddit’s legal team has invoked the Digital Millennium Copyright Act (DMCA), alleging that SerpApi violated the law by circumventing technical protections Reddit put in place to prevent scraping. The DMCA was originally designed to prevent the hacking of digital rights management (DRM) software, such as the encryption on a DVD or a streaming service. Khaleghy and the SerpApi legal team dispute this application of the DMCA. They argue that accessing a public webpage that is freely available to any human with a web browser does not constitute “circumvention.” SerpApi does not break encryption, bypass login credentials, or hack into secure servers. They simply retrieve the same search results that are visible to anyone who enters a query into Google. SerpApi’s motion suggests that Reddit is trying to redefine “technical protections” to include any measure—such as bot detection or IP blocking—that is intended to stop automated access. If the court sides with Reddit, it could mean that simply finding a way around a basic bot-blocker could be treated as a federal crime under the DMCA, a prospect that has the broader tech community and the SEO industry deeply concerned. Contextualizing the Conflict: A Timeline of Legal Escalation The battle between Reddit and SerpApi did not happen in a vacuum. It is part of a broader series of legal actions Reddit has taken as it seeks to monetize its data in the wake of the AI boom. As large language models (LLMs) like GPT-4 and Gemini require massive amounts of human conversation data for training, Reddit’s archives have become incredibly valuable. This has led to a flurry of litigation and public disputes: In October 2022, Reddit filed its initial lawsuit against SerpApi, alongside other entities like Perplexity AI, Oxylabs, and AWMProxy. Reddit alleged that these companies were scraping its content through Google Search and reusing it at scale, often to power AI responses that compete with Reddit’s own platform traffic. A key piece of evidence cited by Reddit was a “trap” post—a piece of content visible only to Google’s crawler and not to human users. When this trap post appeared in responses generated by Perplexity, Reddit claimed it was “smoking gun” evidence of unauthorized scraping. Shortly after the initial filing, SerpApi fired back in late October, calling Reddit’s allegations inflammatory. They defended their right to access public search data, framing the issue as one of information freedom versus corporate gatekeeping. The situation became even more complex in December 2023, when Google itself sued SerpApi. Google’s lawsuit alleged that SerpApi was bypassing its bot protections and scraping licensed search features, such as “People Also Ask” and “Knowledge Graph” boxes. This put SerpApi in the crosshairs of two of the largest data-driven companies in the world simultaneously. By February 2023, SerpApi asked the court to dismiss Google’s lawsuit, using a similar argument to the one they are now using against Reddit: that Google is misusing the DMCA to restrict access to what is essentially public information. The current motion against Reddit is the latest move in this