Meta’s new AI model learns by watching videos

Meta’s AI researchers have released a new model that’s trained in a similar way as today’s large language models, but instead of learning from words, as today’s state-of-the-art language models do, it learns from video.

Yann LeCun, who leads Meta’s FAIR (foundational AI research) group, has been explaining over the past year that the reason children learn about the world so quickly is because they intake lots of information through their optical nerve and through their ears. They learn what things in the world are called and how they work together. Current large language models (LLMs), such as OpenAI’s GPT-4 or Meta’s own Llama models, learn mainly by processing language—they try to learn about the world as its described on the internet. And that, LeCun argues, is why current LLMs aren’t moving very quickly toward artificial general intelligence (where AI is generally smarter than humans).

LLMs are normally trained on thousands of sentences or phrases where some of the words are masked, forcing the model to find the best words to fill in the blanks. In doing so the model learns what words are statistically most likely to come next in a sequence, and they gradually pick up a rudimentary sense of how the world works. They learn, for example, that when a car drives off a cliff it doesn’t just hang in the air—it drops very quickly to the rocks below.

LeCun believes that if LLMs and other AI models could use the same masking technique, but on video footage, they could learn more like babies do. LeCun’s new baby, and the embodiment of his theory, is a research model called Video Joint Embedding Predictive Architecture (V-JEPA). It learns by processing unlabeled video and figuring out what probably happened in a certain part of the screen during the few seconds it was blacked out.

“V-JEPA is a step toward a more grounded understanding of the world so machines can achieve more generalized reasoning and planning,” said LeCun in a statement.

Note that V-JEPA isn’t a generative model. It doesn’t answer questions by generating video, but rather by describing concepts, like the relationship between two real-world objects. The Meta researchers say that V-JEPA, after pretraining using video masking, “excels at detecting and understanding highly detailed interactions between objects.”

Meta’s next step after V-JEPA is to add audio to the video, which would give the model a whole new dimension of data to learn from—just like a child watching a muted TV then turning the sound up. The child would not only see how objects move, but also hear people talking about them, for example. A model pretrained this way might learn that after a car speeds off a cliff it not only rushes toward the ground but makes a big sound upon landing.

“Our goal is to build advanced machine intelligence that can learn more like humans do,” LeCun said, “forming internal models of the world around them to learn, adapt, and forge plans efficiently in the service of completing complex tasks.”

The research could have big implications for both Meta and the broader AI ecosystem.

Meta has talked before about a “world model” in the context of its work on augmented reality glasses. The glasses would use such a model as the brain of an AI assistant that would, among other things, anticipate what digital content to show the user to help them get things done and have more fun. The model would, out of the box, have an audio-visual understanding of the world outside the glasses, but could then learn very quickly about the unique features of a user’s world through the device’s cameras and microphones.

V-JEPA might also lead toward a change in the way AI models are trained, full stop. Current pretraining methods for foundation models require massive amounts of time and compute power (which has ecological implications). At the moment, in other words, developing foundation models is reserved for the rich. With more efficient training methods, that could change. This would be in line with Meta’s strategy of releasing much of its research as open-source rather than protecting it as valuable IP as OpenAI and others do. Smaller developers might be able to train larger and more capable models if training costs went down.

Meta says it’s releasing the V-JEPA model under a Creative Commons noncommercial license so that researchers can experiment with it and perhaps expand its capabilities.

https://www.fastcompany.com/91029951/meta-v-jepa-yann-lecun?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creată 1y | 15 feb. 2024, 18:40:07

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

CrowdStrike lays off 500 workers despite reaffirming a strong 2026 outlook

CrowdStrike reiterated its fiscal 2026 first quarter and annual forecast

7 mai 2025, 19:40:05 | Fast company - tech

Apple eyes AI-powered search as Safari usage declines

Apple is considering reworking its Safari web browser across its devices to place a greater emphasis on AI-powered search engines, Bloomberg

7 mai 2025, 19:40:04 | Fast company - tech

‘The school has to be evacuated’: Connecticut students are setting their Chromebooks on fire for TikTok

The latest TikTok trend is leading to fire evacuations at schools across Connecticut.

As part of the trend, students are filming themselves inserting items such as pencils, paper clips,

7 mai 2025, 17:20:03 | Fast company - tech

Netflix is getting a big TV redesign and AI search

Netflix is finally pushing out the major TV app redesign it started testing last year, with a top navigation bar and new recommendation features. It’s also experimenting with generative AI a

7 mai 2025, 14:50:06 | Fast company - tech

LinkedIn’s new AI tools help job seekers find smarter career fits

New AI features from LinkedIn will soon help job seekers find positions that best suit them—without the n

7 mai 2025, 14:50:05 | Fast company - tech

Meta AI ‘personalized’ chatbot revives privacy fears

As the arms race in the artificial intelligence world ramps up, Big Tech companies are rushing to become your default AI source. Meta, last week, launched the Meta AI app to challenge ChatGPT and

7 mai 2025, 12:40:03 | Fast company - tech

Elon Musk’s new city puts SpaceX in the driver’s seat. Could public services be at risk?

Residents living near SpaceX headquarters in Boca Chica, Texas, will soon have a new public body through which to raise concerns about everything from road maintenance to garbage collection. Earli

7 mai 2025, 12:40:02 | Fast company - tech

Tomas_r2