An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.
In a new paper published this week, the AI Disclosures Project alleges that OpenAI likely trained its GPT-4o model using nonpublic material from O’Reilly Media. The researchers used a legally obtained dataset of 34 copyrighted O’Reilly books and found that GPT-4o showed “strong recognition” of the company’s paywalled content. By contrast, GPT-3.5 Turbo appeared more familiar with publicly accessible O’Reilly book samples.
“These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training,” the authors wrote in the paper. Tim O’Reilly, one of the paper’s authors, is a cofounder and CEO of O’Reilly Media.
An OpenAI spokesperson didn’t immediately respond to Fast Company‘s request for comment.
Training data lies at the heart of all artificial intelligence models. Large language models (LLMs) require an incredible amount of information that it uses to guide back on when it churns out text or images for users.
OpenAI has struck up some licensing deals to be able to train their models on certain content. But the company, which recently fundraised and is worth $300 billion, has also come under fire for sourcing certain content. The New York Times, for example, is leading a charge against OpenAI and minority owner Microsoft over alleged copyright infringement.
The researchers acknowledged limitations in their study but argued that the issue is likely part of a broader systemic problem in how large language models are developed.
“Sustainable ecosystems need to be designed so that both creators and developers can benefit from generative AI,” the authors wrote. “Otherwise, model developers are likely to rapidly plateau in their progress, especially as newer content becomes produced less and less by humans.”
Chcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině

Meta has spent 15 years shunning the iPad. Now, it seems they’re finally ready to embrace the tablet lovers.
WhatsApp users can finally text from the big screen. On Tuesday, Meta a

According to new research from Whop, a marketplace for digital products, one in three Gen Z consumers now make purchasing decisions based on recommendations from AI-generated influencers.

Big U.S. banks are holding internal discussions about expanding into cryptocurrencies as they get stronger endorsements from regulators, but initial steps will be tentative, centering on pilot pro

I can tell you the exact moment when a new browser called Deta Surf clicked for me.
I was getting a demo from Deta cofounder Max Eusterbrock, and he showed me how

Two romantasy authors have publicly defended their use of artificial intelligence after being caught with AI-generated prompts left in their published works. While their readers are far from impre

After back-to-back explosions, SpaceX launched its

Chris Rogers, Instacart’s current chief business officer, is taking over as the delivery giant’s next CEO, the company announced on Wednesday.
Rogers, who has worked at Insta