An AI watchdog accused OpenAI of using copyrighted books without permission

An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.

In a new paper published this week, the AI Disclosures Project alleges that OpenAI likely trained its GPT-4o model using nonpublic material from O’Reilly Media. The researchers used a legally obtained dataset of 34 copyrighted O’Reilly books and found that GPT-4o showed “strong recognition” of the company’s paywalled content. By contrast, GPT-3.5 Turbo appeared more familiar with publicly accessible O’Reilly book samples.

“These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training,” the authors wrote in the paper. Tim O’Reilly, one of the paper’s authors, is a cofounder and CEO of O’Reilly Media.

An OpenAI spokesperson didn’t immediately respond to Fast Company‘s request for comment.

Training data lies at the heart of all artificial intelligence models. Large language models (LLMs) require an incredible amount of information that it uses to guide back on when it churns out text or images for users.

OpenAI has struck up some licensing deals to be able to train their models on certain content. But the company, which recently fundraised and is worth $300 billion, has also come under fire for sourcing certain content. The New York Times, for example, is leading a charge against OpenAI and minority owner Microsoft over alleged copyright infringement.

The researchers acknowledged limitations in their study but argued that the issue is likely part of a broader systemic problem in how large language models are developed.

“Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI,” the authors wrote. “Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans.”


https://www.fastcompany.com/91310223/an-ai-watchdog-accused-openai-of-using-copyrighted-books-without-permission?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Établi 1mo | 2 avr. 2025, 20:30:07


Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Speed-limiting devices could be coming for reckless U.S. drivers in these states

A teenager who admitted being “addicted to speed” behind the wheel had totaled two other cars in the year before he slammed into a minivan at 112 mph (180 kph) in a Seattle suburb,

5 mai 2025, 16:40:03 | Fast company - tech
Nvidia chips could face new tracking rules under a bipartisan bill to stop chip smuggling to China

A U.S. lawmaker plans to introduce legislation in coming weeks to verify the location of

5 mai 2025, 16:40:02 | Fast company - tech
Meta’s AI social feed is a privacy disaster waiting to happen

Since ChatGPT sparked the generative AI revolution in November 2022, interacting with AI has felt like using a digital confession booth—private, intimate, and shielded from public view (unless you

5 mai 2025, 14:20:05 | Fast company - tech
I have trouble focusing, but this AI browser feature helps

My worst workday habit is that I’m a compulsive web page checker.

Throughout the day, I’m constantly refreshing the same handful of sites for updates. I’ll check the me

5 mai 2025, 11:50:07 | Fast company - tech
This is the future of AI, according to Nvidia

​​Recent breakthroughs in generative AI have centered largely on language and imagery—from chatbots that compose sonnets and analyze text to voice models that mimic human speech and tools that tra

5 mai 2025, 11:50:06 | Fast company - tech
Free online storage services compared: Which one’s best for you?

Cloud storage services conveniently let you store and access documents, photos, videos, and more from any device. The best part? Many top providers offer free plans that are surprisingly capable.

5 mai 2025, 05:10:03 | Fast company - tech