Why data will always be a precious commodity in the AI world

The New York Times lawsuit against OpenAI late last year over the tech company’s use of the newspaper’s journalism to train its large language model (LLM) represented a major move in unprecedented times. It also could portend a shift in the Big Tech/content creator relationship—one that was fraught to begin with and might now turn increasingly litigious. At the heart of the suit is the question of data, and whether the companies behind LLMs can claim “fair use” in gobbling up that data.

When we think about the amount of data that is needed to train LLMs it stands to reason that organizations will be protective over how their proprietary data is used and credited. LLMs require vast amounts of data, and despite OpenAI CEO Sam Altman’s recent claims, OpenAI and ChatGPT need access to a wide range of data to strengthen the model—and this may include both proprietary and non-copyright work. The high quality and reliability of the New York Times content precisely strengthens ChatGPT outputs.

That was the company’s own position three weeks ago, according to The Telegraph, which shared a submission from OpenAI to the House of Lords communications and digital select committee. In the submission, the company admitted that it could not train LLMs like ChatGPT without access to copyrighted work. In fact, it would be “impossible.”

Data is the backbone of AI and all models rely on patterns and correlations established by vast amounts of training data. Generative AI tools need high quality training data—like copyrighted content from the New York Times and other notable publishers—to provide high-quality and enough quantity of training data also reduces hallucinations, actually making responses relevant.

While the New York Times’ case against Open AI and Microsoft is probably the most visible challenge involving intellectual property implications of AI, it is hardly the only one. Plaintiffs have filed multiple lawsuits claiming the training process for AI programs infringed upon their copyrights in written and visual works. These include lawsuits by the Authors Guild and authors Paul Tremblay, Michael Chabon, Sarah Silverman, and others against OpenAI. Michael Chabon, Sarah Silverman, and other content creators have also initiated suits against Meta. There are proposed class action lawsuits against Alphabet Inc., Stability AI, and Midjourney, as well as a lawsuit by Getty Images against Stability AI.

As AI use continues to proliferate, there will be increasing pressure to resolve these copyright issues. And litigation involving intellectual property rights is just the tip of the iceberg. The number of cases centered on AI-related accuracy, safety, and discrimination are likely to rise.

Given the complexity and sheer volume of all of these cases, it will likely take years before these matters are resolved. For now, all we can say for sure is that ordinary companies rolling out AI tools would be wise to exercise care to track and monitor their use of the tech. Should a particular AI tool come under regulatory or judicial scrutiny and thus come off the market, companies will want to be able to adapt quickly and smoothly.

https://www.fastcompany.com/91021586/why-data-will-always-be-a-precious-and-protected-commodity-in-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Vytvořeno 2y | 2. 2. 2024 21:20:04


Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

Texas residents push to form a new town to fight Bitcoin mining noise

For months, a group of Hood County, Texas, residents has been pushing to create a new town of their own. The effort began in March, when citizens living in a 2-square-mile unincorporated stretch o

25. 8. 2025 20:10:12 | Fast company - tech
Why AI surveillance cameras keep getting it wrong

Last year, Transport for London tested AI-powered CCTV at Willesden Gr

25. 8. 2025 13:20:05 | Fast company - tech
The gap between AI hype and newsroom reality

Although AI is changing the media, how much it’s

25. 8. 2025 10:50:11 | Fast company - tech
Big Tech locks data away. Wikidata gives it back to the internet

While tech and AI giants guard their knowledge graphs behind proprieta

25. 8. 2025 10:50:10 | Fast company - tech
Another AI tool won’t solve your problems. But AI training might

Every company wants to have an AI strategy: A bold vision to do more w

25. 8. 2025 10:50:08 | Fast company - tech
Smarter AI is supercharging battery innovation 

The global race for better batteries has never been more intense. Electric vehicles, drones, and next-generation aircraft all depend on high-performance energy storage—yet the traditiona

24. 8. 2025 11:40:14 | Fast company - tech