How AI for lip dubbing could change the film industry

In Hollywood, big money is getting lost in translation.

Sure, the global entertainment business is synced up like never before. Marvel blockbusters captivate audiences in China. Korean directors score one coup after another in the U.S. Streaming development executives now scour foreign markets to bring home the next Squid Game, Lupin, and Money Heist. And Western entertainment companies are pouring money into so-called localization efforts to ensure the sun never sets on Spiderman. Disney upped its localization spending to $33 billion in 2022, according to Variety, a 32% increase. Streamers now include options for subtitles and audio in multiple languages, even in old and niche entertainment.

But even as companies invest in quality script translations and better performances by voice actors, dubbed entertainment often still looks as cheesy as old kung fu films and Mr. Ed, turning audiences off. No matter how good the sound is, it seems wrong. Lips don’t lie.

“The lips are always, always the last piece that nobody’s solved for,” says Jonathan Bronfman, cofounder and CEO of the visual effects company, Monsters Aliens Robots Zombies (MARZ).

Earlier this year, Bronfman’s company unveiled a technology called LipDub AI, which digitally manipulates actors’ facial expressions to match spoken words in foreign languages. The technology promises to achieve an extraordinary level of realism and fluency, learning to make actors’ lips match the language and the performers. Marlon Brando will mumble in Mandarin; Jim Carrey will gesticulate in German, and Arnold Schwarzenegger’s English . . . well. AI is making more progress every day.

In the beginning, lip-dubbing technology was a crude joke—Schwarzenegger screaming at a late-night TV host through the superimposed lips of another man (“I AM HEE-AH TO SAVE CALIFORNIA!”). But the promise of new AI-driven software means that global audiences may be laughing with such technology, not at it—as well as crying, cheering, and loving performances where actors deftly deliver lines any of hundreds of languages, whether or not the performers themselves have ever uttered a word in those tongues themselves.

LipDub’s technology is an evolution of an open-source AI model known as Wav2Lip, first released in 2020 by researchers at Hyderabad’s International Institute of Information Technology. Designed initially to synchronize lip movements in videos with specific audio tracks, it analyzes the input audio’s phonetic elements to identify different speech sounds. In parallel, it processes the video, focusing on the speaker’s face, especially the lip area. Wav2Lip uses deep learning models to understand the facial structure and predict corresponding lip movements. The technology combines audio analysis with video data to generate accurate lip synchronization. This results in a video where the lip movements match the spoken words in the audio track, enhancing realism for applications like movie dubbing, video conferencing, or animated characters.

Adapting such technology into a valuable product for the film and advertising industries presented MARZ researchers with intricate challenges. The varying elements of movie production, such as changes in lighting and camera angles, along with scenes featuring multiple actors or several faces, demanded careful consideration. The presence of beards or the appearance of lips from different angles added to the complexity. A significant hurdle emerged when the AI initially failed to differentiate between speakers and non-speakers. This resulted in scenes where every character’s lips moved in sync with a single spoken line.

“Early on, we had to put black boxes over the faces we didn’t want speaking,” says Matt Panousis, MARZ’s cofounder and chief operating officer. “It’s one thing to do this in a simple video clip. It’s another to upload a whole movie.”

While Hollywood clients demand hyperrealism from lip-dubbing software, amateur users are happy to experiment with less sophisticated tech. Plenty of other software companies (Heygen, Eleven Labs) are offering apps that translate short clips of video and audio that are fast, free to use, and still mind-bogglingly real.

MARZ, an AI-enabled visual effects (VFX) studio, was founded in 2018 and remains focused on professional users. The Toronto-based company has developed a reputation for delivering high-quality VFX for television, contributing to notable projects like Marvel’s WandaVision, HBO’s Watchmen, and Netflix’s The Umbrella Academy. The company has grown from 45 employees in 2019 to 80. More than 50 employees are dedicated to Machine Learning, says Bronfman, work that resulted in both LipDub and a product called Vanity, an AI-enabled “digital makeup tool” that “air-brushes” away wrinkles and other aged imperfections from actors’ mugs.

So far, the company is using the LipDub AI technology in house for its existing visual effects clients, including Apple TV. In the months to come, MARZ plans to release a fully automated software tool aimed at video professionals who are already accustomed to software like Adobe Premiere and Final Cut.

The future of AI in Hollywood is still being determined, of course. The SAG-AFTRA union and Hollywood studios are hashing out the nascent technology’s role in productions—and the need for actors’ explicit consent will complicate the deals necessary for Lipdub and other tech to be of use. And President Biden recently issued an executive order seeking to curb misuse of deep fakes, even pshawing at an ersatz semblance of himself at the event. Lips do lie, after all.

If LipDub AI and similar technologies thrive, they could expand the reach of both foreign and domestic films, benefiting creators worldwide. That could represent a pivotal shift in the business of pop culture: Studios and streamers will need to act less importer/exporters—simply slapping new audio over the original actors words’ and shipping it off to foreign consumers—and more to like collectors and curators of authentic global culture, finding talent and stories with universal human appeal.

https://www.fastcompany.com/90981017/ai-dubbing-film-television-marz?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Vytvorené 2y | 19. 11. 2023, 11:20:12


Ak chcete pridať komentár, prihláste sa

Ostatné príspevky v tejto skupine

Replit CEO: What really happened when AI agent wiped Jason Lemkin’s database (exclusive)

Late last week, an AI coding agent from Replit, an AI software develop

23. 7. 2025, 0:40:04 | Fast company - tech
Medieval wellness is back—and it’s all over your FYP

Social media is overflowing with wellness hacks and tips. While some should be avoided at all costs, others may actually be rooted in medicinal practices dating back to the Dark Ages, new research

22. 7. 2025, 17:40:07 | Fast company - tech
Two court cases against Elon Musk are putting Tesla’s self-driving tech in the spotlight, again

Elon Musk fought court cases on opposite coasts Monday, raising a question about the billionaire that could either speed his plan to put

22. 7. 2025, 17:40:05 | Fast company - tech
The rise of the CTO in the age of ‘business unusual’

Years ago, I spent a lot of time making the case for why IT mattered in large enterprises. It’s fair to say the landscape has changed—dramatically.

Where I once had to argue for IT’s str

22. 7. 2025, 13:10:03 | Fast company - tech
Delta is just the beginning: How AI is going to put dynamic pricing into everything you buy

Summer vacation season is here, but it may be the last time Americans can travel affordably by plane—especially if Delta has its way.

As the world’s

22. 7. 2025, 13:10:02 | Fast company - tech
This new smartphone is designed for old-school physical keyboard lovers

It seems the market has spoken when it comes to phones with physical keyboards. BlackBerry exited the mobil

22. 7. 2025, 10:40:09 | Fast company - tech
Douglas Rushkoff wants us to use AI to ask better questions

Douglas Rushkoff, the writer and media theorist who chronicled the countercultural spirit of early ’90s online culture in books like

22. 7. 2025, 10:40:06 | Fast company - tech