Why data will always be a precious commodity in the AI world

The New York Times lawsuit against OpenAI late last year over the tech company’s use of the newspaper’s journalism to train its large language model (LLM) represented a major move in unprecedented times. It also could portend a shift in the Big Tech/content creator relationship—one that was fraught to begin with and might now turn increasingly litigious. At the heart of the suit is the question of data, and whether the companies behind LLMs can claim “fair use” in gobbling up that data.

When we think about the amount of data that is needed to train LLMs it stands to reason that organizations will be protective over how their proprietary data is used and credited. LLMs require vast amounts of data, and despite OpenAI CEO Sam Altman’s recent claims, OpenAI and ChatGPT need access to a wide range of data to strengthen the model—and this may include both proprietary and non-copyright work. The high quality and reliability of the New York Times content precisely strengthens ChatGPT outputs.

That was the company’s own position three weeks ago, according to The Telegraph, which shared a submission from OpenAI to the House of Lords communications and digital select committee. In the submission, the company admitted that it could not train LLMs like ChatGPT without access to copyrighted work. In fact, it would be “impossible.”

Data is the backbone of AI and all models rely on patterns and correlations established by vast amounts of training data. Generative AI tools need high quality training data—like copyrighted content from the New York Times and other notable publishers—to provide high-quality and enough quantity of training data also reduces hallucinations, actually making responses relevant.

While the New York Times’ case against Open AI and Microsoft is probably the most visible challenge involving intellectual property implications of AI, it is hardly the only one. Plaintiffs have filed multiple lawsuits claiming the training process for AI programs infringed upon their copyrights in written and visual works. These include lawsuits by the Authors Guild and authors Paul Tremblay, Michael Chabon, Sarah Silverman, and others against OpenAI. Michael Chabon, Sarah Silverman, and other content creators have also initiated suits against Meta. There are proposed class action lawsuits against Alphabet Inc., Stability AI, and Midjourney, as well as a lawsuit by Getty Images against Stability AI.

As AI use continues to proliferate, there will be increasing pressure to resolve these copyright issues. And litigation involving intellectual property rights is just the tip of the iceberg. The number of cases centered on AI-related accuracy, safety, and discrimination are likely to rise.

Given the complexity and sheer volume of all of these cases, it will likely take years before these matters are resolved. For now, all we can say for sure is that ordinary companies rolling out AI tools would be wise to exercise care to track and monitor their use of the tech. Should a particular AI tool come under regulatory or judicial scrutiny and thus come off the market, companies will want to be able to adapt quickly and smoothly.

https://www.fastcompany.com/91021586/why-data-will-always-be-a-precious-and-protected-commodity-in-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 1y | 2 feb. 2024, 21:20:04


Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

Goodbye human drivers? Waymo’s robotaxis are now fully operational

Summoning a robotaxi from your phone is not a futuristic fantasy since Waymo achieved full commercial deployment.

https://www.fastcompany.com/91325288/goodbye-human-drivers-waymos-robotaxis-a

6 mai 2025, 08:50:02 | Fast company - tech
‘You got to be really careful what you tie your name to’: The Hawk Tuah girl is planning a rebrand

Haliey Welch, better known as the Hawk Tuah girl, is ready for a rebrand.

After being thrust into the spotlight in 2024, thanks to her now-iconic “Hawk Tuah” catchphrase—featured in a vi

5 mai 2025, 23:30:07 | Fast company - tech
Anthropic hires a top Biden official to lead its new AI-for-social-good team (exclusive)

Anthropic is turning to a Biden administration alum to run its new Beneficial Deployments team, which is tasked with helping extend the benefits of its AI to organizations focused on social good—p

5 mai 2025, 21:20:03 | Fast company - tech
Speed-limiting devices could be coming for reckless U.S. drivers in these states

A teenager who admitted being “addicted to speed” behind the wheel had totaled two other cars in the year before he slammed into a minivan at 112 mph (180 kph) in a Seattle suburb,

5 mai 2025, 16:40:03 | Fast company - tech
Nvidia chips could face new tracking rules under a bipartisan bill to stop chip smuggling to China

A U.S. lawmaker plans to introduce legislation in coming weeks to verify the location of

5 mai 2025, 16:40:02 | Fast company - tech
Meta’s AI social feed is a privacy disaster waiting to happen

Since ChatGPT sparked the generative AI revolution in November 2022, interacting with AI has felt like using a digital confession booth—private, intimate, and shielded from public view (unless you

5 mai 2025, 14:20:05 | Fast company - tech