Why data will always be a precious commodity in the AI world

The New York Times lawsuit against OpenAI late last year over the tech company’s use of the newspaper’s journalism to train its large language model (LLM) represented a major move in unprecedented times. It also could portend a shift in the Big Tech/content creator relationship—one that was fraught to begin with and might now turn increasingly litigious. At the heart of the suit is the question of data, and whether the companies behind LLMs can claim “fair use” in gobbling up that data.

When we think about the amount of data that is needed to train LLMs it stands to reason that organizations will be protective over how their proprietary data is used and credited. LLMs require vast amounts of data, and despite OpenAI CEO Sam Altman’s recent claims, OpenAI and ChatGPT need access to a wide range of data to strengthen the model—and this may include both proprietary and non-copyright work. The high quality and reliability of the New York Times content precisely strengthens ChatGPT outputs.

That was the company’s own position three weeks ago, according to The Telegraph, which shared a submission from OpenAI to the House of Lords communications and digital select committee. In the submission, the company admitted that it could not train LLMs like ChatGPT without access to copyrighted work. In fact, it would be “impossible.”

Data is the backbone of AI and all models rely on patterns and correlations established by vast amounts of training data. Generative AI tools need high quality training data—like copyrighted content from the New York Times and other notable publishers—to provide high-quality and enough quantity of training data also reduces hallucinations, actually making responses relevant.

While the New York Times’ case against Open AI and Microsoft is probably the most visible challenge involving intellectual property implications of AI, it is hardly the only one. Plaintiffs have filed multiple lawsuits claiming the training process for AI programs infringed upon their copyrights in written and visual works. These include lawsuits by the Authors Guild and authors Paul Tremblay, Michael Chabon, Sarah Silverman, and others against OpenAI. Michael Chabon, Sarah Silverman, and other content creators have also initiated suits against Meta. There are proposed class action lawsuits against Alphabet Inc., Stability AI, and Midjourney, as well as a lawsuit by Getty Images against Stability AI.

As AI use continues to proliferate, there will be increasing pressure to resolve these copyright issues. And litigation involving intellectual property rights is just the tip of the iceberg. The number of cases centered on AI-related accuracy, safety, and discrimination are likely to rise.

Given the complexity and sheer volume of all of these cases, it will likely take years before these matters are resolved. For now, all we can say for sure is that ordinary companies rolling out AI tools would be wise to exercise care to track and monitor their use of the tech. Should a particular AI tool come under regulatory or judicial scrutiny and thus come off the market, companies will want to be able to adapt quickly and smoothly.

https://www.fastcompany.com/91021586/why-data-will-always-be-a-precious-and-protected-commodity-in-ai?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Létrehozva 1y | 2024. febr. 2. 21:20:04

Jelentkezéshez jelentkezzen be

EGYÉB POSTS Ebben a csoportban

A newly discovered exoplanet rekindles humanity’s oldest question: Are we alone?

Child psychologists tell us that around the age of five or six, children begin to seriously contemplate the world around them. It’s a glorious moment every parent recognizes—when young minds start

2025. júl. 13. 11:10:06 | Fast company - tech

How Watch Duty became a go-to app during natural disasters

During January’s unprecedented wildfires in Los Angeles, Watch Duty—a digital platform providing real-time fire data—became the go-to app for tracking the unfolding disaster and is credit

2025. júl. 13. 6:30:05 | Fast company - tech

Why the AI pin won’t be the next iPhone

One of the most frequent questions I’ve been getting from business execs lately is whether the

2025. júl. 12. 12:10:02 | Fast company - tech

Microsoft will soon delete your Authenticator passwords. Here are 3 password manager alternatives

Users of Microsoft apps are having a rough year. First, in May, the Windows maker

2025. júl. 12. 9:40:03 | Fast company - tech

Yahoo Creators platform hits record revenue as publisher bets big on influencer-led content

Yahoo’s bet on creator-led content appears to be paying off. Yahoo Creators, the media company’s publishing platform for creators, had its most lucrative month yet in June.

Launched in M

2025. júl. 11. 17:30:04 | Fast company - tech

GameStop’s Nintendo Switch 2 stapler sells for more than $100,000 on eBay after viral mishap

From being the face of memestock mania to going viral for inadvertently stapling the screens of brand-new video game consoles, GameStop is no stranger to infamy.

Last month, during the m

2025. júl. 11. 12:50:04 | Fast company - tech

Don’t take the race for ‘superintelligence’ too seriously

The technology industry has always adored its improbably audacious goals and their associated buzzwords. Meta CEO Mark Zuckerberg is among the most enamored. After all, the name “Meta” is the resi

2025. júl. 11. 12:50:02 | Fast company - tech

Tomas_r2