OpenAI’s new o1 models push AI to PhD-level intelligence

OpenAI introduced on Thursday OpenAI o1, a new series of large language models the company says are designed for solving difficult problems and working though complex tasks.

The models were trained to take longer to perform tasks than other AI models, thinking through problems in ways a human might. They can “refine their thinking process, try different strategies, and recognize their mistakes, OpenAI says in a press release. The models perform similarly to PhD students when working on physics, chemistry, and biology problems. 

The o1 models scored 83% on a qualifying exam for the International Mathematics Olympiad, OpenAI says, while its earlier GPT-4o model correctly solved only 13% of problems.

OpenAI provided some specific use case examples. The o1 models could be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers to build and execute multi-step workflows. They also perform well in math and coding. 

Within OpenAI the o1 models were first codenamed “Q*” (pronounced “Q-star”), then “Strawberry.”

OpenAI says it’s taking a slow and cautious approach to releasing the new models. It’s releasing a couple of “early previews” of two of the models in the series. People with ChatGPT Plus or Teams accounts can access “o1-preview” by choosing it in a drop down menu within the chatbot. They can also choose “o1-mini,” which is faster and good at STEM questions, OpenAI says. 

Developers and researchers can access the models within ChatGPT and via an application programming interface. 

OpenAI says the new models won’t initially be able to access the internet. Users won’t be able to upload images or files to the models. OpenAI says it’s beefed up the safety features around the models, and has informed federal authorities about the more capable models.

https://www.fastcompany.com/91189817/openais-new-o1-models-push-ai-to-phd-level-intelligence?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Létrehozva 12mo | 2024. szept. 12. 20:30:04


Jelentkezéshez jelentkezzen be

EGYÉB POSTS Ebben a csoportban

How I took control of my email address with a custom domain

Over the past three years, I’ve changed email providers three times without ever changing email addresses.

That’s because my address is entirely under my control. Instead of relying on a

2025. szept. 1. 14:30:04 | Fast company - tech
This viral grocery hack will help you save money and reduce waste

If you dread the weekly grocery shop, or get sidetracked by fun snacks only to end up with no real meals, this might be the hack for you.

The 5-4-3-2-1 method gives shoppers like you a s

2025. aug. 31. 13:10:02 | Fast company - tech
Do Trump’s tariffs mean you’ll pay more for the iPhone 17 next month?

If 2025 is the year of anything, it is the year of the tariff. Ever since President Trump unleashed his

2025. aug. 30. 11:30:07 | Fast company - tech
This simple free service makes sharing PDFs painless

Look, I’m not gonna lie to ya’: I’ve got a bit of a love-hate relationship with PDFs. And, more often than not, it veers mostly toward the “hate” side of that spectrum.

Don’t get m

2025. aug. 30. 11:30:04 | Fast company - tech
Palantir is mapping government data. What it means for governance

When the U.S. government signs contracts with private technology companies, the fine print rarely reaches the public. Palantir Technologies, however, has at

2025. aug. 30. 9:10:09 | Fast company - tech
‘The New York Times’ paywalled the Mini Crossword and the internet is in shambles

Bad news for morning routines everywhere: The New York Times has put its Mini Crossword behind a paywall.

On Tuesday, instead of their usual puzzle, players were met with a paywall. The

2025. aug. 29. 19:20:05 | Fast company - tech
Chinese tech giant Alibaba aims to fill Nvidia void with its new AI chip

China’s Alibaba has developed a new chip that is more versatile than its older chips and is meant to serve a broader range of

2025. aug. 29. 16:50:06 | Fast company - tech