OpenAI and Google reportedly used transcriptions of YouTube videos to train their AI models

OpenAI and Google trained their AI models on text transcribed from YouTube videos, potentially violating creators’ copyrights, according to The New York Times. The report, which describes the lengths OpenAI, Google and Meta have gone to in order to maximize the amount of data they can feed to their AIs, cites numerous people with knowledge of the companies’ practices. It comes just days after YouTube CEO Neal Mohan said in an interview with Bloomberg Originals that OpenAI’s alleged use of YouTube videos to train its new text-to-video generator, Sora, would go against the platform’s policies.

According to the NYT, OpenAI used its Whisper speech recognition tool to transcribe more than one million hours of YouTube videos, which were then used to train GPT-4. The Information previously reported that OpenAI had used YouTube videos and podcasts to train the two AI systems. OpenAI president Greg Brockman was reportedly among the people on this team. Per Google’s rules, “unauthorized scraping or downloading of YouTube content” is not allowed, Matt Bryant, a spokesperson for Google, told NYT, also saying that the company was unaware of any such use by OpenAI.

The report, however, claims there were people at Google who knew but did not take action against OpenAI because Google was using YouTube videos to train its own AI models. Google told NYT it only does so with videos from creators who have agreed to take part in an experimental program. Engadget has reached out to Google and OpenAI for comment.

The NYT report also claims Google tweaked its privacy policy in June 2022 to more broadly cover its use of publicly available content, including Google Docs and Google Sheets, to train its AI models and products. Bryant told NYT that this is only done with the permission of users who opt into Google’s experimental features, and that the company “did not start training on additional types of data based on this language change.”

This article originally appeared on Engadget at https://www.engadget.com/openai-and-google-reportedly-used-transcriptions-of-youtube-videos-to-train-their-ai-models-163531073.html?src=rss https://www.engadget.com/openai-and-google-reportedly-used-transcriptions-of-youtube-videos-to-train-their-ai-models-163531073.html?src=rss
Creado 1y | 6 abr 2024, 16:40:13


Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

Google is testing customizable calling cards for Android that show up when your friends call

Google has started rolling out customizable calling cards for the beta versions of its

15 ago 2025, 14:50:11 | Engadget
Apple TV+ releases the first 'Peanuts' musical in 37 years

Apple is making good on its promise to release new Peanuts content with today's premiere of

15 ago 2025, 14:50:09 | Engadget
HORI's Piranha Plant camera for Switch 2 drops to $40

The HORI Piranha Plant camera for the Nintendo Switch 2

15 ago 2025, 14:50:07 | Engadget
MacBook Air deal: Pick up the M4-powered laptop while it's down to a record-low price

Whether you need a new MacBook for the upcoming semester or you've just be

15 ago 2025, 14:50:06 | Engadget
MasterClass memberships are 40 percent off right now

MasterClass promises online learning with instructors who are the very best in their fields, and an annual subscription is

15 ago 2025, 14:50:05 | Engadget
The Morning After: Insta360’s first drone is unlike anything else

The Insta360 Antigravity A1 is a new 360-degree FPV drone from

15 ago 2025, 12:30:13 | Engadget
Engadget Podcast: How real is Ford's $30,000 EV pickup truck?

Ford has big plans for 2027: This week, the American carmaker announced a new "Universal EV Platform" for future electric cars, spearheaded by a $30,000 mid-sized EV pickup. In this episode, we're

15 ago 2025, 12:30:11 | Engadget