An AI watchdog accused OpenAI of using copyrighted books without permission

An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.

In a new paper published this week, the AI Disclosures Project alleges that OpenAI likely trained its GPT-4o model using nonpublic material from O’Reilly Media. The researchers used a legally obtained dataset of 34 copyrighted O’Reilly books and found that GPT-4o showed “strong recognition” of the company’s paywalled content. By contrast, GPT-3.5 Turbo appeared more familiar with publicly accessible O’Reilly book samples.

“These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training,” the authors wrote in the paper. Tim O’Reilly, one of the paper’s authors, is a cofounder and CEO of O’Reilly Media.

An OpenAI spokesperson didn’t immediately respond to Fast Company‘s request for comment.

Training data lies at the heart of all artificial intelligence models. Large language models (LLMs) require an incredible amount of information that it uses to guide back on when it churns out text or images for users.

OpenAI has struck up some licensing deals to be able to train their models on certain content. But the company, which recently fundraised and is worth $300 billion, has also come under fire for sourcing certain content. The New York Times, for example, is leading a charge against OpenAI and minority owner Microsoft over alleged copyright infringement.

The researchers acknowledged limitations in their study but argued that the issue is likely part of a broader systemic problem in how large language models are developed.

“Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI,” the authors wrote. “Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans.”

https://www.fastcompany.com/91310223/an-ai-watchdog-accused-openai-of-using-copyrighted-books-without-permission?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Erstellt 4mo | 02.04.2025, 20:30:07

Melden Sie sich an, um einen Kommentar hinzuzufügen

Andere Beiträge in dieser Gruppe

How Tesla’s Autopilot verdict could stifle Musk’s robotaxi expansion

A court verdict against Tesla last week, stemming from a fatal 2019 crash of an Aut

05.08.2025, 17:50:11 | Fast company - tech

Cloudflare vs. Perplexity: a web scraping war with big implications for AI

When the web was established several decades ago, it was built on a number of principles. Among them was a key, overarching standard dubbed “netiquette”: Do unto others as you’d want done unto you

05.08.2025, 17:50:09 | Fast company - tech

Taiwanese authorities investigate TSMC chip trade secrets leak

Taiwanese authorities have detained three people for allegedly stealing technology trade secrets from Taiwan Semiconductor Manufacturing Co (

05.08.2025, 17:50:08 | Fast company - tech

How Russia restricts internet access, making it complicated and even dangerous

YouTube videos that won’t load. A visit to a popular inde

05.08.2025, 15:40:03 | Fast company - tech

AT&T to pay $177 million in data breach settlement. Here’s how to claim up to $5,000

After suffering two significant data breaches in recent years, AT&T has agreed to pay $177 million to customers affected by the incidents. Some individuals could receive

05.08.2025, 11:10:02 | Fast company - tech

What the White House Action Plan on AI gets right and wrong about bias

Artificial intelligence fuels something called automation bias. I often bring thi

05.08.2025, 08:40:04 | Fast company - tech

Online scam uses fake ICE raids at Target and Walmart to steal personal data

A new online scam is exploiting fears surrounding immigration raids.

If your “For You” page on

05.08.2025, 06:20:07 | Fast company - tech

Tomas_r2