Here’s the real reason AI companies are slimming down their models

OpenAI on Thursday announced GPT-4o mini, a smaller and less expensive version of its GPT-4o AI model. OpenAI is one of a number of AI companies to develop a version of its best “foundation” model that trades away some intelligence for some speed and affordability. Such a trade-off could let more developers power their apps with AI, and may open the door for more complex apps like autonomous agents in the future. 

The largest large language models (LLMs) use billions or trillions of parameters (or the synapse-like connection points where a neural network does its calculations) to perform a wide array of reasoning and query-related tasks. They’re also trained on massive amounts of data covering a wide variety of topics. “Small language models,” or SLMs, on the other hand, use only millions or tens of millions of parameters to perform a narrower set of tasks, and require less computing power and a smaller set of more focused training data. 

For developers with simpler (and perhaps less profitable) apps, an SLM may be their only viable option. OpenAI says GPT-4o mini is 60% cheaper than GPT-3.5 Turbo, formerly the most economical OpenAI model for developers. 

Or, it may be a question of speed. Many applications of AI don’t require the vast general knowledge of a large AI model. They may need faster answers to easier questions. “If my kid’s writing his term paper [with the help of an AI tool], the latency isn’t a huge issue,” says Mike Intrator, CEO of CoreWeave, which hosts AI models in its cloud. Latency refers to the time needed for an AI app to get an answer from a model in the cloud. “But if you were to use it for surgery or for automated driving or something like that, the latency begins to make much more of an impact on the experience.” The models used in self-driving cars, Intrator points out, have to be small enough to run on a computer chip in the car, not up in a cloud server. 

GPT-4o mini is smaller than other models, but still not small enough to run on a device like a phone or game console. So it must run on a server in the cloud like all of OpenAI’s other models. The company isn’t saying whether it’s working on on-device models (though Apple has confirmed it is). 

Faster and cheaper models could be the key to the next generation of AI-powered apps

Today most AI-powered applications involve a single query, or a few queries, to a model running in the cloud. But cutting-edge apps require many queries to many different models, says Robert Nishihara, cofounder and CEO of Anyscale, which provides a platform for putting AI models and workloads into production. For example, an app that helps you select a vacation rental might use one model to generate the selection criteria, another model to select some rental options, and still another model to score each of those options against the criteria, and so on. And directing and orchestrating all these queries is a complex business.

“When so many model invocations are composed together, cost and latency explode,” Nishihara says. “Finding ways to reduce cost and latency is an essential step in bringing these applications to production.” 

The performance of the models is important, but their speed and cost are equally important. OpenAI knows this, as do companies like Meta and Google, both of which are creating smaller and faster open-source models. The model downsizing efforts of these companies are crucial to using AI models for more complex applications, such as personal assistants that do end-to-end tasks on behalf of a user, Nishihara says.

OpenAI doesn’t divulge the parameter size of its models, but its mini is likely comparably sized to Anthropic’s Claude 3 Haiku and Google’s Gemini 1.5 Flash. OpenAI says mini performs better than those comparable models in benchmark tests. 

OpenAI says app developers—the biggest beneficiaries of the speed and cost improvements—will be able to access mini through an API starting today, and that the new models will begin to support queries from its ChatGPT app today as well. 

The “o” in GPT-4o stands for “omni” or “multimodal,” meaning the ability to process and reason on imagery and sound, not just text. The mini model supports text and vision in the API, and OpenAI says the model will support video and audio capabilities in the future.

https://www.fastcompany.com/91159169/openai-gpt-4o-mini-developers?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Vytvořeno 1y | 22. 7. 2024 20:30:40


Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

Craft is a great all-in-one productivity tool

This article is republished with permission from Wonder Tools, a newsletter that helps you discover the most useful sites and apps. 

20. 7. 2025 5:20:08 | Fast company - tech
I’m a two-time tech founder. But restaurants are where I learned to lead

Sudden equipment failures. Supply chain surprises. Retaining staff as the goalposts move in real time. These aren’t challenges I’ve faced as a tech founder—but I have faced them running restaurant

19. 7. 2025 13:10:05 | Fast company - tech
Staying hands on made scaling to $1B+ fun for Cloudflare’s founder

On this week’s Most Innovative Companies podcast, Cloudflare COO Michelle Zatlyn talks with Fast Company staff writer David Salazar about hitting $1B in revenue and going global, as well as

19. 7. 2025 8:30:05 | Fast company - tech
‘Who did this guy become?’ This creator quit his job and lost his TikTok audience

If you’ve built an audience around documenting your 9-to-5 online, what happens after you hand in your notice?

That’s the conundrum facing Connor Hubbard, aka “hubs.life,” a creator who

18. 7. 2025 20:50:06 | Fast company - tech
Meta-owned WhatsApp could be banned in Russia. Here’s why

WhatsApp should prepare to leave the Russian market, a lawmaker who regulates the IT sector

18. 7. 2025 16:20:03 | Fast company - tech