Here’s the real reason AI companies are slimming down their models

OpenAI on Thursday announced GPT-4o mini, a smaller and less expensive version of its GPT-4o AI model. OpenAI is one of a number of AI companies to develop a version of its best “foundation” model that trades away some intelligence for some speed and affordability. Such a trade-off could let more developers power their apps with AI, and may open the door for more complex apps like autonomous agents in the future. 

The largest large language models (LLMs) use billions or trillions of parameters (or the synapse-like connection points where a neural network does its calculations) to perform a wide array of reasoning and query-related tasks. They’re also trained on massive amounts of data covering a wide variety of topics. “Small language models,” or SLMs, on the other hand, use only millions or tens of millions of parameters to perform a narrower set of tasks, and require less computing power and a smaller set of more focused training data. 

For developers with simpler (and perhaps less profitable) apps, an SLM may be their only viable option. OpenAI says GPT-4o mini is 60% cheaper than GPT-3.5 Turbo, formerly the most economical OpenAI model for developers. 

Or, it may be a question of speed. Many applications of AI don’t require the vast general knowledge of a large AI model. They may need faster answers to easier questions. “If my kid’s writing his term paper [with the help of an AI tool], the latency isn’t a huge issue,” says Mike Intrator, CEO of CoreWeave, which hosts AI models in its cloud. Latency refers to the time needed for an AI app to get an answer from a model in the cloud. “But if you were to use it for surgery or for automated driving or something like that, the latency begins to make much more of an impact on the experience.” The models used in self-driving cars, Intrator points out, have to be small enough to run on a computer chip in the car, not up in a cloud server. 

GPT-4o mini is smaller than other models, but still not small enough to run on a device like a phone or game console. So it must run on a server in the cloud like all of OpenAI’s other models. The company isn’t saying whether it’s working on on-device models (though Apple has confirmed it is). 

Faster and cheaper models could be the key to the next generation of AI-powered apps

Today most AI-powered applications involve a single query, or a few queries, to a model running in the cloud. But cutting-edge apps require many queries to many different models, says Robert Nishihara, cofounder and CEO of Anyscale, which provides a platform for putting AI models and workloads into production. For example, an app that helps you select a vacation rental might use one model to generate the selection criteria, another model to select some rental options, and still another model to score each of those options against the criteria, and so on. And directing and orchestrating all these queries is a complex business.

“When so many model invocations are composed together, cost and latency explode,” Nishihara says. “Finding ways to reduce cost and latency is an essential step in bringing these applications to production.” 

The performance of the models is important, but their speed and cost are equally important. OpenAI knows this, as do companies like Meta and Google, both of which are creating smaller and faster open-source models. The model downsizing efforts of these companies are crucial to using AI models for more complex applications, such as personal assistants that do end-to-end tasks on behalf of a user, Nishihara says.

OpenAI doesn’t divulge the parameter size of its models, but its mini is likely comparably sized to Anthropic’s Claude 3 Haiku and Google’s Gemini 1.5 Flash. OpenAI says mini performs better than those comparable models in benchmark tests. 

OpenAI says app developers—the biggest beneficiaries of the speed and cost improvements—will be able to access mini through an API starting today, and that the new models will begin to support queries from its ChatGPT app today as well. 

The “o” in GPT-4o stands for “omni” or “multimodal,” meaning the ability to process and reason on imagery and sound, not just text. The mini model supports text and vision in the API, and OpenAI says the model will support video and audio capabilities in the future.

https://www.fastcompany.com/91159169/openai-gpt-4o-mini-developers?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creado 11mo | 22 jul 2024, 20:30:40


Inicia sesión para agregar comentarios

Otros mensajes en este grupo.

Stephen Miller has a hefty financial stake in a key ICE contractor

Stephen Miller, the hard-line Trump adviser who helped craft some of the administration’s most aggressive immigration enforcement policies, is apparently profiting from the tools that make them po

26 jun 2025, 13:30:04 | Fast company - tech
Why Lyft is convening its drivers to plan the future of robotaxis

Robotaxis are crashing into the rideshare market. 

Drivers for apps like Uber and Lyft are growing worried about autonomous vehicles. Waymo has already deployed their vehicles acros

26 jun 2025, 13:30:03 | Fast company - tech
How the Internet of Things impacts everyone’s privacy

Some unusual witnesses helped convict Alex Murdaugh of the murders of his wife, Maggie, and son, Paul.

The first was Bubba, Maggie’s yellow Labrador retriever. Prosecutors used

26 jun 2025, 11:10:05 | Fast company - tech
Want to see the future of AI art? Go grab some scissors

Danish artist Andreas Refsgaard has been combining generative AI with handcrafted prototypes to create unique glimpses of what’s ahead—a future that could one day make artists like him obsolete.

26 jun 2025, 11:10:05 | Fast company - tech
BeReal is back. Can it stick around this time?

Is it time to BeReal again?

In 2022, the photo-sharing app surged in popularity, won Apple’s “App of the Year,” and even earned its own SNL skit. Once a day, at a random time, users were

25 jun 2025, 21:20:02 | Fast company - tech
Bipartisan bill aims to ban Chinese AI from federal agencies

A bipartisan group of lawmakers on Wednesday vowed to keep Chinese artificial i

25 jun 2025, 18:50:04 | Fast company - tech
Why everyone on social media is ‘monitoring the situation’

Who’s monitoring the situation right now?

As headlines continue to be dominated by news of missile attacks, retaliations, and calls for ceasefire, there are no shortage of situations to

25 jun 2025, 16:30:07 | Fast company - tech