OpenAI on Monday announced GPT-4o, a brand new AI model that that the company says is one step closer to “much more natural human-computer interaction.” The new model accepts any combination of text, audio and images as input and can generate an output in all three formats. It’s also capable of recognizing emotion, lets you interrupt it mid-speech, and responds nearly as fast as a human being during conversations.
“The special thing about GPT-4o is it beings GPT-4 level intelligence to everyone, including our free users,” said OpenAI CTO Mira Murati during a live-streamed presentation. “This is the first time we’re making a huge step forward when it comes to ease of use.”
During the presentation, OpenAI showed off GPT-4o translating live between English and Italian, helping a researcher solve a linear equation in real time on paper, and providing guidance on deep breathing to another OpenAI executive simply by listening to his breaths.
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
— OpenAI (@OpenAI) May 13, 2024
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
The “o” in GPT-4o stands for “omni”, a reference to the model’s multimodal capabilities. OpenAI said that GPT-4o was trained across text, vision and audio, which means that all inputs and outputs are processed by the same neural network. This is different from the company’s previous models, GPT-3.5 and GPT-4, which did let users ask questions simply by speaking, but then transcribing the speech into text. This stripped out tone and emotion and made interactions slower.
OpenAI is making the new model available to everyone, including free ChatGPT users, over the next few weeks and also releasing a desktop version of ChatGPT, initially for the Mac, which paid users will have access to starting today.
OpenAI’s announcement comes a day before Google I/O, the company’s annual developer conference. Shortly after OpenAI revealed GPT-4o, Google teased a version of Gemini, its own AI chatbot, with similar capabilties.
This article originally appeared on Engadget at https://www.engadget.com/openai-claims-that-its-free-gpt-4o-model-can-talk-laugh-sing-and-see-like-a-human-184249780.html?src=rss https://www.engadget.com/openai-claims-that-its-free-gpt-4o-model-can-talk-laugh-sing-and-see-like-a-human-184249780.html?src=rssChcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině
Square Enix has largely kept its lips sealed about the Dragon Quest 3 HD-2D remake since announc
I’m a big fan of the British comedy game show Taskmaster. Each season, five c
The top-of-the-line MacBooks are undoubtedly expensive, so we're happy anytime there's a sale. Right now,
There’s no finer pleasure than starting the day with a slice of hot, fresh bread dripping in salted butter. Poets have waxed on about the joys of transforming so few ingredients into such a beautif
Memorial Day is here, and along with the holiday has come a swath of tech deals available across the internet. In addition to the typical outdoor gear we see go on sale during this time, a number o