An OpenAI ‘open’ model shows how much the company—and AI—has changed in two years

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here

OpenAI says it will release an open-source model–but why now? 

OpenAI CEO Sam Altman said Monday that his company intends to release a “powerful new open-weight language model with reasoning” in the next few months. That would mark a major shift for a company that has kept its models proprietary and secret since 2019. The announcement wasn’t a total surprise: After the groundbreaking Chinese open-source model DeepSeek-R1 showed up in January, Altman said during a Reddit AMA that he realized his company was “on the wrong side of history” and suggested an OpenAI open-source model was a real possibility.

Open models typically come with a permissive license that requires little or no payment to the model developer. Open-weight models can be more cost-effective for corporations trying to leverage AI since they allow businesses to host (and secure) the models themselves—avoiding the often risky prospect of sending proprietary data through an API to a third-party provider and paying fees to do it. More businesses are moving in this direction—especially those holding sensitive user data in regulated industries.

The catch: A corporate user doesn’t have to pay to use the open model. Some AI labs release open models to gain credibility in the market—potentially paving the way to eventually sell API access to their more powerful closed models. By releasing open models early on, the French AI company Mistral established itself as a top-tier AI lab and a legitimate alternative to U.S. players. Some AI labs release open-source models, then earn consulting fees by helping large enterprises deploy and optimize the models over time. 

Meta’s Llama models are the most widely deployed “open” models—though the company restricts reuse and redistribution and keeps the training data and code secret, meaning they are not by definition open source. Meta had different reasons for giving away its models. Unlike Mistral and others, it makes money by surveilling users and targeting ads—not by renting out AI models. Zuckerberg continues funding Llama research because the models are a disruptive force in the industry and earn Meta the right to be called an “AI company.” 

OpenAI now has its own reasons for releasing an open-weight model. Eighteen months ago, OpenAI was the undisputed champion of state-of-the-art AI models. But in the time since, the release of LLMs like Google’s formidable Gemini 2.0 and DeepSeek’s open-source R1 have cracked the competition wide open. 

The market has changed, and OpenAI itself has evolved. Like Meta, OpenAI doesn’t depend directly and solely on its models for its revenue. Selling access to its models via an API is no longer the company’s main source of revenue. Now, most of its revenue, not to mention its staggering $300 billion valuation, comes from selling subscriptions to ChatGPT (most of them to individual consumers). OpenAI’s real superpower is being a household-name consumer AI brand.

OpenAI will definitely continue pouring massive resources into developing ever-better models, but its main reason for doing so isn’t to collect rent from developers for direct access to them, but rather to continue making ChatGPT smarter for consumers. 

AI video generation is getting scary good

AI-video-generation tools are rapidly leaping over the uncanny valley, making it increasingly difficult for everyday internet users to distinguish between real and generated video. This could bode well for smaller companies looking to produce glossy, creative, or ambitious ads at a fraction of the normal cost. But it could spell bad news if bad actors use the technology in phishing scams or to spread disinformation. It’s also yet another threat to the film sector’s livelihood. 

The issue is back in the spotlight following several  announcements, starting with Runway’s  

release of its new Gen-4 video-generation system, which the company says produces “production ready” video. 

AI startup Runway says the new system of models understands “much of the world’s physics” (a claim supported by this video of a man being overtaken by an ocean wave). The company also touts improvements in video consistency and realism, as well as user control during the generation process. Runway posted a demo video of Gen-4’s control tools, which makes the production process look pretty easy, even for non-technicals). Some of the samples of finished videos posted on X look somehow more real than real (see Jean Baudrillard, Simulacra and Simulation). 

Runway faces some stiff competition in the AI video space in the form of perennial contenders including Google’s Veo 2 model, OpenAI’s Sora, Adobe Firefly, Pika, and Kling

A new math benchmark aims to beat test question “contamination”

People in the AI community have been debating for some time whether our current methods of testing models’ math skills are broken. The concern is that while existing math benchmarks contain some very hard problems, those problems (and their solutions) tend to get published online pretty quickly. This of course makes the problem-solution sets fair game for AI companies sweeping up training data for their next models. The worry is that, come evaluation time, the models may have already encountered the test problems and answers in their training data. 

A new benchmark called MathArena was designed to eliminate those issues. MathArena takes its math problems from very recent math competitions and Olympiads, which have obvious incentives to keep their problems secret. The researchers from MathArena also created their own standard method of administering the evaluation, meaning the AI model developers can’t give their own models an edge via changes to the evaluation setup. 

MathArena has just released the results of the most recent benchmark, which includes questions from the 2025 USA Math Olympiad. Here’s one of the questions: “Let H be the orthocenter of the acute triangle ABC, let F be the foot of the altitude from C to AB, and let P be the reflection of H across BC. Suppose that the circumcircle of triangle AFP intersects line BC at two distinct points, X and Y. Prove that C is the midpoint of XY.” Ouch. And to make matters worse, the test requires not only the correct answer but a description of each reasoning step the model took along the way.

The results are, well, ugly. Some of the most powerful and celebrated models in the world took the test, and none scored above 5%. The top score went to DeepSeek’s R1 model, which earned a 4.76%. Google’s Gemini 2.0 Flash Thinking model scored 4.17%. Anthropic’s Claude 3.7 Sonnet (Thinking) scored 3.65%. OpenAI’s most recent thinking model, o3 mini, scored 2.08%.  

The results suggest one of several possibilities: Maybe MathArena contains far harder questions than other benchmarks, or LLMs aren’t great at explaining their reasoning steps, or earlier math benchmark scores are questionable because the LLMs had already seen the answers. Looks like LLMs still have some homework to do.

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91310415/an-openai-open-model-shows-how-much-the-company-and-ai-has-changed-in-two-years?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Created 2mo | Apr 3, 2025, 5:20:11 PM


Login to add comment

Other posts in this group

The debate over state-level AI bans misses the point

Both sides are missing the point entirely as Congress debates the proposed 10-year ban on state AI laws contained in the “Big, Beautiful Bill.”

The current wrangling over who should regu

Jun 16, 2025, 12:30:04 PM | Fast company - tech
China is catching up to the U.S. in pharmaceuticals, but it’s not too late to turn that around

A decade ago, China had just a few hundred pharmaceutical drugs actively in development. Today, China has thousands of drugs in active development and is

Jun 16, 2025, 12:30:04 PM | Fast company - tech
Block’s CFO explains Gen Z’s surprising approach to money management

One stock recently impacted by a whirlwind of volatility is Block—the fintech powerhouse behind Square, Cash App, Tidal Music, and more. The company’s COO and CFO, Amrita Ahuja, shares how her tea

Jun 16, 2025, 5:30:04 AM | Fast company - tech
Computer simulations reveal the first wheel was invented nearly 6,000 years ago

Imagine you’re a copper miner in southeastern Europe in the year 3900 BCE. Day after day you haul copper ore through the mine’s sweltering tunnels.

You’ve resigned yourself to the grueli

Jun 15, 2025, 10:50:05 AM | Fast company - tech
This free website is like GasBuddy for parking

Parking in a city can be a problem. It’s not just about finding parking—it’s about finding the right parking. Sometimes, there’s a $10 parking spot only a block away from a garage that ch

Jun 14, 2025, 11:40:07 AM | Fast company - tech
How a planetarium show discovered a spiral at the edge of our solar system

If you’ve ever flown through outer space, at least while watching a documentary or a science fiction film, you’ve seen how artists turn astronomical findings into stunning visuals. But in the proc

Jun 14, 2025, 11:40:05 AM | Fast company - tech
Apple just made 3 great new privacy and security enhancements—but missed these 3 opportunities

This week, Apple previewed its redesigned (and renumbered) operating syste

Jun 14, 2025, 9:30:02 AM | Fast company - tech