SerpApi asks court to throw out Reddit scraping complaint
The legal landscape surrounding data scraping, intellectual property, and search engine accessibility is currently undergoing a massive transformation. At the heart of this shift is a high-stakes legal battle between Reddit, the self-proclaimed front page of the internet, and SerpApi, a company that provides tools to scrape search engine results pages (SERPs). SerpApi has officially moved to have Reddit’s lawsuit dismissed, a move that could set a major precedent for how data is handled in the age of generative AI and automated data collection.
The motion to dismiss follows an amended complaint filed by Reddit in February, which sought to tighten the legal noose around SerpApi and several other defendants. However, SerpApi argues that Reddit’s claims are not only factually thin but represent a dangerous attempt to expand platform power over content that Reddit does not technically own and data that is fundamentally public.
The Core of the Dispute: Ownership and the User Agreement
One of the primary pillars of SerpApi’s defense centers on the question of who actually owns the content posted on Reddit. In a blog post addressing the legal action, SerpApi CEO Julien Khaleghy pointed out a significant irony in Reddit’s legal strategy. According to Reddit’s own User Agreement, the individuals who post content—the users—retain ownership of their contributions. While Reddit holds a non-exclusive license to host, display, and distribute that content, it does not possess the full copyright ownership required to sue third parties for copyright infringement in the manner they are attempting.
SerpApi argues that Reddit is attempting to use copyright law as a blunt instrument to control information it does not own. If the court agrees with SerpApi, it could undermine Reddit’s entire legal standing in the case. Under U.S. copyright law, to bring a successful infringement claim, the plaintiff typically must prove they own the valid copyright to the material in question. By admitting in their terms of service that users retain ownership, Reddit may have created a legal barrier for itself that is difficult to bypass.
The Nature of Search Snippets
Another critical aspect of the defense involves the nature of the data being “scraped.” Reddit’s complaint highlights the use of snippets—short fragments of text, dates, addresses, and usernames—that appear in search results. SerpApi contends that these fragments are not copyrightable. Under the “de minimis” doctrine and the factual nature of such data, short phrases and metadata generally do not meet the threshold of original creative work required for copyright protection.
Furthermore, SerpApi emphasizes that they are not scraping Reddit directly. Instead, they are accessing Google Search pages. This distinction is vital to their legal strategy. When a user searches Google, Google displays snippets of various websites, including Reddit. SerpApi provides a service that allows users to see what Google is showing. Therefore, SerpApi argues they are acting as a middleman for public search data rather than a pirate of Reddit’s private database.
The DMCA Controversy: What Constitutes Circumvention?
Reddit’s legal team has invoked the Digital Millennium Copyright Act (DMCA), alleging that SerpApi violated the law by circumventing technical protections Reddit put in place to prevent scraping. The DMCA was originally designed to prevent the hacking of digital rights management (DRM) software, such as the encryption on a DVD or a streaming service.
Khaleghy and the SerpApi legal team dispute this application of the DMCA. They argue that accessing a public webpage that is freely available to any human with a web browser does not constitute “circumvention.” SerpApi does not break encryption, bypass login credentials, or hack into secure servers. They simply retrieve the same search results that are visible to anyone who enters a query into Google.
SerpApi’s motion suggests that Reddit is trying to redefine “technical protections” to include any measure—such as bot detection or IP blocking—that is intended to stop automated access. If the court sides with Reddit, it could mean that simply finding a way around a basic bot-blocker could be treated as a federal crime under the DMCA, a prospect that has the broader tech community and the SEO industry deeply concerned.
Contextualizing the Conflict: A Timeline of Legal Escalation
The battle between Reddit and SerpApi did not happen in a vacuum. It is part of a broader series of legal actions Reddit has taken as it seeks to monetize its data in the wake of the AI boom. As large language models (LLMs) like GPT-4 and Gemini require massive amounts of human conversation data for training, Reddit’s archives have become incredibly valuable. This has led to a flurry of litigation and public disputes:
In October 2022, Reddit filed its initial lawsuit against SerpApi, alongside other entities like Perplexity AI, Oxylabs, and AWMProxy. Reddit alleged that these companies were scraping its content through Google Search and reusing it at scale, often to power AI responses that compete with Reddit’s own platform traffic. A key piece of evidence cited by Reddit was a “trap” post—a piece of content visible only to Google’s crawler and not to human users. When this trap post appeared in responses generated by Perplexity, Reddit claimed it was “smoking gun” evidence of unauthorized scraping.
Shortly after the initial filing, SerpApi fired back in late October, calling Reddit’s allegations inflammatory. They defended their right to access public search data, framing the issue as one of information freedom versus corporate gatekeeping.
The situation became even more complex in December 2023, when Google itself sued SerpApi. Google’s lawsuit alleged that SerpApi was bypassing its bot protections and scraping licensed search features, such as “People Also Ask” and “Knowledge Graph” boxes. This put SerpApi in the crosshairs of two of the largest data-driven companies in the world simultaneously.
By February 2023, SerpApi asked the court to dismiss Google’s lawsuit, using a similar argument to the one they are now using against Reddit: that Google is misusing the DMCA to restrict access to what is essentially public information. The current motion against Reddit is the latest move in this multi-front legal war.
The “Trap Post” and the Ethics of Web Crawling
Reddit’s use of a “trap” post is one of the more unique elements of this case. By inserting hidden data that only a search engine crawler would find, Reddit was able to trace where that data ended up. When that same specific string of text appeared in the output of AI tools that use SerpApi to gather data, Reddit felt they had definitive proof of a copyright violation.
However, SerpApi’s defense targets the logic of this trap. Even if the data was “trapped,” the question remains: does Reddit own the rights to the search result itself? If the data is being surfaced via Google, SerpApi argues that the responsibility for how that data is indexed and presented lies with the search engine and the public nature of the web, not with the source site’s attempts to hide or track specific snippets.
Broader Implications for the SEO and AI Industries
The outcome of the SerpApi vs. Reddit case will have ripples far beyond the two companies involved. For the SEO industry, tools like Ahrefs, Semrush, and Moz rely on the ability to analyze search data to help businesses improve their visibility. If the court rules that scraping SERPs is a violation of the DMCA or copyright law, the entire ecosystem of search analytics could be in jeopardy.
Furthermore, the AI industry is watching closely. Companies like Perplexity and OpenAI use various methods to gather real-time data from the web to ensure their answers are current. If platforms like Reddit can successfully sue third-party data providers for “indirectly” scraping their content via search engines, it would create a massive legal hurdle for AI development. It could force AI companies to pay licensing fees to every major platform on the internet, effectively creating a “pay-to-play” model for the web that favors established tech giants over smaller innovators.
Privacy Policies and Public Accessibility
SerpApi also points to Reddit’s own privacy policy as a defense. The policy explicitly states that public posts on the platform may appear in search results. SerpApi argues that by allowing Google to index its content, Reddit has effectively consented to that content being part of the public search index. Once content is in the index, it becomes part of the public record of the internet.
Reddit’s attempt to distinguish between “human” searchers and “automated” scrapers is, in SerpApi’s view, a distinction without a legal difference when it comes to publicly accessible data. If a person can see it, a machine should be allowed to record what the person sees, provided it doesn’t violate specific, enforceable contracts—contracts that SerpApi claims they never signed with Reddit.
The Argument Against Platform Power
In his blog post titled “Reddit’s Lawsuit is a Dangerous Attempt to Expand Platform Power,” Julien Khaleghy argues that the internet is moving toward a dangerous fragmentation. He suggests that if companies like Reddit are allowed to control how their data appears in search engines and who is allowed to look at those search results, the “open web” will effectively cease to exist.
This “walled garden” approach is exactly what many early internet pioneers feared. If every platform can sue anyone who observes its public-facing data through a third-party lens, the transparency of the internet is lost. SerpApi frames itself as a defender of this transparency, providing a “neutral” view of what search engines are showing the world.
What Happens Next?
The federal court must now decide whether Reddit’s amended complaint has enough merit to proceed to discovery and trial. If the judge grants SerpApi’s motion to dismiss with prejudice, the case will effectively end, and Reddit will be barred from filing the same claims against SerpApi again in this jurisdiction. Such a ruling would be a massive victory for data scrapers and the SEO tool industry.
However, if the motion is denied, the case will move forward, likely leading to a long and expensive discovery process where both companies’ internal communications and technical methods will be scrutinized. This could involve deep dives into how SerpApi’s “proxies” work and how Reddit’s “bot-traps” are implemented.
Regardless of the immediate outcome, the tension between content creators, platform owners, and data aggregators is only going to intensify. As Reddit prepares for its life as a public company and seeks to maximize the value of its data for AI licensing deals, it has every incentive to keep fighting. Conversely, SerpApi is fighting for its very business model, which depends on the legal right to provide search data to its clients.
Conclusion: A Landmark Moment for Digital Rights
The SerpApi vs. Reddit case is more than just a dispute over a few snippets of text. It is a battle over the fundamental rules of the internet. Does a platform’s control over its content end when it allows that content to be indexed by a search engine? Does the DMCA protect “public” data from being accessed by automated tools? And who really owns the conversations we have on the web?
As we wait for the court’s decision, the tech world remains on edge. The ruling will help define the boundaries of the “fair use” of data in an era where data is the most valuable commodity on earth. Whether the court sees SerpApi as a legitimate tool for information transparency or Reddit as a rightful protector of its ecosystem will determine the future of web scraping for years to come.