Meta’s new AI model can check other models’ work

Facebook owner Meta said on Friday it was releasing a batch of new AI models from its research division, including a “Self-Taught Evaluator” that may offer a path toward less human involvement in the AI development process.

The release follows Meta’s introduction of the tool in an August paper, which detailed how it relies upon the same “chain of thought” technique used by OpenAI’s recently released o1 models to get it to make reliable judgments about models’ responses.

That technique involves breaking down complex problems into smaller logical steps and appears to improve the accuracy of responses on challenging problems in subjects like science, coding and math.

Meta’s researchers used entirely AI-generated data to train the evaluator model, eliminating human input at that stage as well.

The ability to use AI to evaluate AI reliably offers a glimpse at a possible pathway toward building autonomous AI agents that can learn from their own mistakes, two of the Meta researchers behind the project told Reuters.

Many in the AI field envision such agents as digital assistants intelligent enough to carry out a vast array of tasks without human intervention.

Self-improving models could cut out the need for an often expensive and inefficient process used today called Reinforcement Learning from Human Feedback, which requires input from human annotators who must have specialized expertise to label data accurately and verify that answers to complex math and writing queries are correct.

“We hope, as AI becomes more and more super-human, that it will get better and better at checking its work, so that it will actually be better than the average human,” said Jason Weston, one of the researchers.

“The idea of being self-taught and able to self-evaluate is basically crucial to the idea of getting to this sort of super-human level of AI,” he said.

Other companies including Google and Anthropic have also published research on the concept of RLAIF, or Reinforcement Learning from AI Feedback. Unlike Meta, however, those companies tend not to release their models for public use.

Other AI tools released by Meta on Friday included an update to the company’s image-identification Segment Anything model, a tool that speeds up LLM response generation times and datasets that can be used to aid the discovery of new inorganic materials.

—Katie Paul, Reuters

https://www.fastcompany.com/91213556/meta-ai-model-self-taught-evaluator?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creado 8mo | 21 oct 2024, 23:10:17


Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

Compass’s lawsuit against Zillow highlights the growing power struggle in online real estate

Two of the nation’s real estate titans are on a collision course.

Compass, one of the largest brokerage

23 jun 2025, 20:30:07 | Fast company - tech
This Perplexity cofounder wants to help AI breakthroughs graduate from university labs

A team of prominent AI researchers, led by Databricks and Perplexity cofounder Andy Konwinski, has launched Laude Institute, a new nonprofit that helps univers

23 jun 2025, 18:20:04 | Fast company - tech
MrBeast used AI to create YouTube thumbnails. People weren’t pleased

YouTube star Jimmy Donaldson—aka MrBeast—is the face of the online video-sharing platform. He tops the platform’s most-subscribed list, with more than 400 million people following his exploits. On

23 jun 2025, 18:20:02 | Fast company - tech
The internet of agents is rising fast, and publishers are nowhere near ready

Imagine you owned a bookstore. Most of your revenue depends on customers coming in and buying books, so you set up dif

23 jun 2025, 11:20:07 | Fast company - tech
How ‘Subway Surfers’ has dominated mobile gaming for over a decade

For 13 years, Subway Surfers’ download rate has been consistent: about one million new installs every single day. 

Half of those downloads come from users upgrading to new

23 jun 2025, 11:20:06 | Fast company - tech
A new Roblox study shows how longer suspensions help curb bad behavior on platforms

Misbehavior on digital platforms can be tricky to manage. Issue warnings, and you risk not deterring bad behavior. Block too readily, and you might drive away your user base and open yourself to a

23 jun 2025, 11:20:04 | Fast company - tech