Cloudflare vs. Perplexity: a web scraping war with big implications for AI

When the web was established several decades ago, it was built on a number of principles. Among them was a key, overarching standard dubbed “netiquette”: Do unto others as you’d want done unto you. It’s a principle that lived on through other companies, including Google, whose motto for a period was “Don’t be evil.”

The fundamental idea was simple: Act ethically and morally. If someone asked you to stop doing something, you stopped—or at least considered it. But Cloudflare, an IT company that protects millions of websites from hostile internet attacks, has published an eye-opening exposé suggesting that one of the leading AI tools today isn’t following that principle.

Cloudflare claims Perplexity, an AI-powered “answer engine,” is overriding website requests not to crawl their content by spoofing its identity to hide that the requests are coming from an AI company. Cloudflare launched its investigation after receiving complaints from customers that Perplexity was ignoring directives in robots.txt files, which are used by websites to signal whether they want their content indexed by search engines or AI crawlers.

Perplexity’s alleged behavior highlights what happens when the web shifts from being rooted in voluntary agreements to a more hard-nosed business environment, where commercial goals overrule moral considerations.

“The code of honor around crawling and robots.txt files is a charming remnant from when the web was collaborative and based on community standards,” says Eerke Boiten, a cybersecurity researcher at De Montfort University in the U.K. Cloudflare’s position as a market leader in web protection means that, for now at least, it’s still possible to preserve some remnants of that morality, Boiten says.

Boiten believes the sense of ethical cooperation online is fading fast, noting that many large AI companies show little regard for where or how they obtain their training data, often operating in murky ethical territory. While he sees OpenAI as generally respectful of the established norms, he’s far less optimistic about others. “Perplexity trying to scrape their way around any defenses feels like it will be the norm rather than the exception,” he says.

Perplexity’s alleged conduct stands out as particularly bold, especially given that the company is already facing a lawsuit over unauthorized content scraping.

Dow Jones Company—the parent of the Wall Street Journal and New York Postfiled a lawsuit in October 2024, alleging that Perplexity “copies on a massive scale” their content. (The case is ongoing.) The BBC also sent a letter in June to Perplexity CEO Aravind Srinivas, threatening legal action for scraping its content without permission unless the company stops and either compensates for the data already accessed or deletes it entirely. Perplexity told the Financial Times that the BBC’s case was “manipulative and opportunistic” and reflected a “fundamental misunderstanding” of copyright law.

Perplexity did not respond to Fast Company‘s request for comment on this story. But Boiten, for his part, anticipates an escalating arms race between those trying to protect online content from AI-driven web scraping and the companies attempting to do just that to improve their models. “Cloudflare applying machine learning to spot Perplexity’s patterns, and acknowledging that publication of all this likely means Perplexity will come up with new decoys,” he says.

Cornell Law professor James Grimmelmann says the legal limits of scraping content without permission—or bypassing robots.txt files—remain unclear, but Cloudflare’s findings could expose Perplexity to more lawsuits.

“There is a loose judicial consensus that it is okay to scrape sites when their robots.txt files allow it,” says Grimmelmann, “but Perplexity seems determined to fuck around and find out whether the reverse is true.”

https://www.fastcompany.com/91380448/cloudflare-vs-perplexity-a-web-scraping-war-with-big-implications-for-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Établi 6d | 5 août 2025, 17:50:09


Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Indonesia eyes entering the AI race with a new sovereign fund

Authorities overseeing the development of artificial intelligence in Ind

11 août 2025, 17:30:06 | Fast company - tech
Inside the looming AI-agents war that will redefine the economics of the web

There’s a war brewing in the world of AI agents. After

11 août 2025, 17:30:06 | Fast company - tech
Content creators are cashing in with live events

Forget Cowboy Carter or the Eras tour, the hottest ticket this year is for your favorite podcast.  

Content creator tours sold nearly 500% more tickets this year compared to 20

11 août 2025, 12:50:05 | Fast company - tech
The British conspiracy guru building a sovereign micronation in Appalachia 

Matthew Williams has slept very little since he learned about Sacha Stone’s plan to build a “sovereign” micronation on 60 acres of land near his home in rural Tennessee. What began as a quic

11 août 2025, 10:30:08 | Fast company - tech
These 4 phones will drastically reduce your screen time

Let’s be honest: Your phone is a jerk. A loud, demanding, little pocket-size jerk that never stops buzzing, dinging, and begging for your attention. It’s the first thing you see in the

11 août 2025, 05:50:06 | Fast company - tech
This tool will help declutter your digital mess

This article is republished with permission from Wonder Tools, a newsletter that helps you discover the most useful sites and apps. 

11 août 2025, 05:50:05 | Fast company - tech