An OpenAI ‘open’ model shows how much the company—and AI

Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

OpenAI says it will release an open-source model–but why now?

OpenAI CEO Sam Altman said Monday that his company intends to release a “powerful new open-weight language model with reasoning” in the next few months. That would mark a major shift for a company that has kept its models proprietary and secret since 2019. The announcement wasn’t a total surprise: After the groundbreaking Chinese open-source model DeepSeek-R1 showed up in January, Altman said during a Reddit AMA that he realized his company was “on the wrong side of history” and suggested an OpenAI open-source model was a real possibility.

Open models typically come with a permissive license that requires little or no payment to the model developer. Open-weight models can be more cost-effective for corporations trying to leverage AI since they allow businesses to host (and secure) the models themselves—avoiding the often risky prospect of sending proprietary data through an API to a third-party provider and paying fees to do it. More businesses are moving in this direction—especially those holding sensitive user data in regulated industries.

The catch: A corporate user doesn’t have to pay to use the open model. Some AI labs release open models to gain credibility in the market—potentially paving the way to eventually sell API access to their more powerful closed models. By releasing open models early on, the French AI company Mistral established itself as a top-tier AI lab and a legitimate alternative to U.S. players. Some AI labs release open-source models, then earn consulting fees by helping large enterprises deploy and optimize the models over time.

Meta’s Llama models are the most widely deployed “open” models—though the company restricts reuse and redistribution and keeps the training data and code secret, meaning they are not by definition open source. Meta had different reasons for giving away its models. Unlike Mistral and others, it makes money by surveilling users and targeting ads—not by renting out AI models. Zuckerberg continues funding Llama research because the models are a disruptive force in the industry and earn Meta the right to be called an “AI company.”

OpenAI now has its own reasons for releasing an open-weight model. Eighteen months ago, OpenAI was the undisputed champion of state-of-the-art AI models. But in the time since, the release of LLMs like Google’s formidable Gemini 2.0 and DeepSeek’s open-source R1 have cracked the competition wide open.

The market has changed, and OpenAI itself has evolved. Like Meta, OpenAI doesn’t depend directly and solely on its models for its revenue. Selling access to its models via an API is no longer the company’s main source of revenue. Now, most of its revenue, not to mention its staggering $300 billion valuation, comes from selling subscriptions to ChatGPT (most of them to individual consumers). OpenAI’s real superpower is being a household-name consumer AI brand.

OpenAI will definitely continue pouring massive resources into developing ever-better models, but its main reason for doing so isn’t to collect rent from developers for direct access to them, but rather to continue making ChatGPT smarter for consumers.

AI video generation is getting scary good

AI-video-generation tools are rapidly leaping over the uncanny valley, making it increasingly difficult for everyday internet users to distinguish between real and generated video. This could bode well for smaller companies looking to produce glossy, creative, or ambitious ads at a fraction of the normal cost. But it could spell bad news if bad actors use the technology in phishing scams or to spread disinformation. It’s also yet another threat to the film sector’s livelihood.

The issue is back in the spotlight following several announcements, starting with Runway’s

release of its new Gen-4 video-generation system, which the company says produces “production ready” video.

AI startup Runway says the new system of models understands “much of the world’s physics” (a claim supported by this video of a man being overtaken by an ocean wave). The company also touts improvements in video consistency and realism, as well as user control during the generation process. Runway posted a demo video of Gen-4’s control tools, which makes the production process look pretty easy, even for non-technicals). Some of the samples of finished videos posted on X look somehow more real than real (see Jean Baudrillard, Simulacra and Simulation).

Runway faces some stiff competition in the AI video space in the form of perennial contenders including Google’s Veo 2 model, OpenAI’s Sora, Adobe Firefly, Pika, and Kling.

A new math benchmark aims to beat test question “contamination”

People in the AI community have been debating for some time whether our current methods of testing models’ math skills are broken. The concern is that while existing math benchmarks contain some very hard problems, those problems (and their solutions) tend to get published online pretty quickly. This of course makes the problem-solution sets fair game for AI companies sweeping up training data for their next models. The worry is that, come evaluation time, the models may have already encountered the test problems and answers in their training data.

A new benchmark called MathArena was designed to eliminate those issues. MathArena takes its math problems from very recent math competitions and Olympiads, which have obvious incentives to keep their problems secret. The researchers from MathArena also created their own standard method of administering the evaluation, meaning the AI model developers can’t give their own models an edge via changes to the evaluation setup.

MathArena has just released the results of the most recent benchmark, which includes questions from the 2025 USA Math Olympiad. Here’s one of the questions: “Let H be the orthocenter of the acute triangle ABC, let F be the foot of the altitude from C to AB, and let P be the reflection of H across BC. Suppose that the circumcircle of triangle AFP intersects line BC at two distinct points, X and Y. Prove that C is the midpoint of XY.” Ouch. And to make matters worse, the test requires not only the correct answer but a description of each reasoning step the model took along the way.

The results are, well, ugly. Some of the most powerful and celebrated models in the world took the test, and none scored above 5%. The top score went to DeepSeek’s R1 model, which earned a 4.76%. Google’s Gemini 2.0 Flash Thinking model scored 4.17%. Anthropic’s Claude 3.7 Sonnet (Thinking) scored 3.65%. OpenAI’s most recent thinking model, o3 mini, scored 2.08%.

The results suggest one of several possibilities: Maybe MathArena contains far harder questions than other benchmarks, or LLMs aren’t great at explaining their reasoning steps, or earlier math benchmark scores are questionable because the LLMs had already seen the answers. Looks like LLMs still have some homework to do.