Cloudflare vs. Perplexity: a web scraping war with big implications for AI

When the web was established several decades ago, it was built on a number of principles. Among them was a key, overarching standard dubbed “netiquette”: Do unto others as you’d want done unto you. It’s a principle that lived on through other companies, including Google, whose motto for a period was “Don’t be evil.”

The fundamental idea was simple: Act ethically and morally. If someone asked you to stop doing something, you stopped—or at least considered it. But Cloudflare, an IT company that protects millions of websites from hostile internet attacks, has published an eye-opening exposé suggesting that one of the leading AI tools today isn’t following that principle.

Cloudflare claims Perplexity, an AI-powered “answer engine,” is overriding website requests not to crawl their content by spoofing its identity to hide that the requests are coming from an AI company. Cloudflare launched its investigation after receiving complaints from customers that Perplexity was ignoring directives in robots.txt files, which are used by websites to signal whether they want their content indexed by search engines or AI crawlers.

Perplexity’s alleged behavior highlights what happens when the web shifts from being rooted in voluntary agreements to a more hard-nosed business environment, where commercial goals overrule moral considerations.

“The code of honor around crawling and robots.txt files is a charming remnant from when the web was collaborative and based on community standards,” says Eerke Boiten, a cybersecurity researcher at De Montfort University in the U.K. Cloudflare’s position as a market leader in web protection means that, for now at least, it’s still possible to preserve some remnants of that morality, Boiten says.

Boiten believes the sense of ethical cooperation online is fading fast, noting that many large AI companies show little regard for where or how they obtain their training data, often operating in murky ethical territory. While he sees OpenAI as generally respectful of the established norms, he’s far less optimistic about others. “Perplexity trying to scrape their way around any defenses feels like it will be the norm rather than the exception,” he says.

Perplexity’s alleged conduct stands out as particularly bold, especially given that the company is already facing a lawsuit over unauthorized content scraping.

Dow Jones Company—the parent of the Wall Street Journal and New York Postfiled a lawsuit in October 2024, alleging that Perplexity “copies on a massive scale” their content. (The case is ongoing.) The BBC also sent a letter in June to Perplexity CEO Aravind Srinivas, threatening legal action for scraping its content without permission unless the company stops and either compensates for the data already accessed or deletes it entirely. Perplexity told the Financial Times that the BBC’s case was “manipulative and opportunistic” and reflected a “fundamental misunderstanding” of copyright law.

Perplexity did not respond to Fast Company‘s request for comment on this story. But Boiten, for his part, anticipates an escalating arms race between those trying to protect online content from AI-driven web scraping and the companies attempting to do just that to improve their models. “Cloudflare applying machine learning to spot Perplexity’s patterns, and acknowledging that publication of all this likely means Perplexity will come up with new decoys,” he says.

Cornell Law professor James Grimmelmann says the legal limits of scraping content without permission—or bypassing robots.txt files—remain unclear, but Cloudflare’s findings could expose Perplexity to more lawsuits.

“There is a loose judicial consensus that it is okay to scrape sites when their robots.txt files allow it,” says Grimmelmann, “but Perplexity seems determined to fuck around and find out whether the reverse is true.”

https://www.fastcompany.com/91380448/cloudflare-vs-perplexity-a-web-scraping-war-with-big-implications-for-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Created 4h | Aug 5, 2025, 5:50:09 PM


Login to add comment

Other posts in this group

Palantir hits $1 billion in quarterly sales for the first time, avoids DOGE cuts

Shares of Palantir Technologies sailed past previous record highs Tuesday after

Aug 5, 2025, 8:20:04 PM | Fast company - tech
How Tesla’s Autopilot verdict could stifle Musk’s robotaxi expansion

A court verdict against Tesla last week, stemming from a fatal 2019 crash of an Aut

Aug 5, 2025, 5:50:11 PM | Fast company - tech
Taiwanese authorities investigate TSMC chip trade secrets leak

Taiwanese authorities have detained three people for allegedly stealing technology trade secrets from Taiwan Semiconductor Manufacturing Co (

Aug 5, 2025, 5:50:08 PM | Fast company - tech
AT&T to pay $177 million in data breach settlement. Here’s how to claim up to $5,000

After suffering two significant data breaches in recent years, AT&T has agreed to pay $177 million to customers affected by the incidents. Some individuals could receive

Aug 5, 2025, 11:10:02 AM | Fast company - tech
What the White House Action Plan on AI gets right and wrong about bias

Artificial intelligence fuels something called automation bias. I often bring thi

Aug 5, 2025, 8:40:04 AM | Fast company - tech
Online scam uses fake ICE raids at Target and Walmart to steal personal data

A new online scam is exploiting fears surrounding immigration raids.

If your “For You” page on

Aug 5, 2025, 6:20:07 AM | Fast company - tech