This startup wants to reprogram the mind of AI—and just got $50 million to do it

Anthropic, Menlo Ventures, and other AI industry players are betting $50 million on a company called Goodfire, which aims to understand how AI models think and steer them toward better, safer answers.

Even as AI becomes more embedded in business systems and personal lives, researchers still lack a clear understanding of how AI models generate their output. So far, the go-to method for improving AI behavior has focused on shaping training data and refining prompting methods, rather than addressing the models’ internal “thought” processes. Goodfire is tackling the latter—and showing real promise.

The company boasts a kind of dream team of mechanistic interpretability pioneers. Cofounder Tom McGrath helped create the interpretability team at DeepMind. Cofounder Lee Sharkey pioneered the use of sparse autoencoders in language models. Nick Cammarata started the interpretability team at OpenAI alongside Chris Olah, who later cofounded Anthropic. Collectively, these researchers have delivered some of the field’s biggest breakthroughs.

Goodfire founder and CEO Eric Ho, who left a successful AI app company in 2022 to focus on interpretability, tells Fast Company that the new funding will be used to expand the research team and enhance its “Ember” interpretability platform. In addition to its core research efforts, Goodfire also generates revenue by deploying field teams to help client organizations understand and control the outputs of their AI models.

Goodfire is developing the knowledge and tools needed to perform “brain surgery” on AI models. Its researchers have found ways to isolate modules within neural networks to reveal the AI’s “thoughts.” Using a technique they call neural programming, they can intervene and redirect a model’s cognition toward higher-quality, more aligned outputs. “We envision a future where you can bring a little bit of the engineering back to neural networks,” Ho says.

The company has also been collaborating with other AI labs to solve interpretability challenges. For example, Goodfire has helped the Arc Institute interpret the inner workings of its Evo 2 DNA foundation model, which analyzes nucleotide sequences and predicts what comes next. By understanding how the model makes its predictions, researchers have uncovered unique biological concepts—potentially valuable for new scientific discoveries.

Anthropic, too, may benefit from Goodfire’s insights. “Our investment in Goodfire reflects our belief that mechanistic interpretability is among the best bets to help us transform black-box neural networks into understandable, steerable systems—a critical foundation for the responsible development of powerful AI,” Anthropic CEO Dario Amodei said in a statement.

According to Ho, Goodfire has also been fielding requests from Fortune 500 companies that want to better understand how the large language models they use for business are “thinking”—and how to change faulty reasoning into sound decision-making. He notes that many within businesses still see AI models as another kind of software, something that can be reprogrammed when it produces incorrect outputs. But AI works differently: It generates responses based on probabilities and a degree of randomness. Improving those outputs requires intervention within the models’ cognitive processes, steering them in more productive directions.

This kind of intervention is still a new and imprecise science. “It remains crude and at a high level and not precise,” Ho says. Still, Goodfire offers an initial tool kit that gives enterprises a level of control more familiar from traditional deterministic software.

As companies increasingly rely on AI for decisions that affect real lives, Ho believes the ability to understand and redirect AI models will become essential. For instance, if a developer equips a model with ethical or safety guardrails, an organization should be able to locate the layer or parameter in the neural network where the model chose to bypass the rules—or tried to appear compliant while it wasn’t. This would mean turning the AI black box into a glass box, with tools to reach inside and make necessary adjustments.

Ho is optimistic that interpretability research can rise to the challenge. “This is a solvable, tractable, technical problem, but it’s going to take our smartest researchers and engineers to solve the really hard problem of understanding and aligning models to human goals and morals.”

As AI systems begin to surpass human intelligence, concerns are growing about their alignment with human values and interests. A major part of the challenge lies in simply understanding what’s happening inside AI models, which often “think” in alien, opaque ways. Whether the big AI labs are investing enough in interpretability remains an open question—one with serious implications for our readiness for an AI-driven future. That’s why it’s encouraging to see major industry players putting real funding behind an interpretability research lab. Lightspeed Venture Partners, B Capital, Work-Bench, Wing, and South Park Commons also participated in the funding round. Menlo Ventures partner Deedy Das will join Goodfire’s board of directors.

While most of the tech world now rushes ahead with the development and application of generative AI models, concerns about the inscrutable nature of the models often get brushed aside as afterthoughts. But that wasn’t always the case. Google hesitated to put generative models into production because it feared being sued over unexpected and unexplainable model outputs.

In some industries, however, such concerns remain very relevant, Das points out. “There are extremely sensitive use cases in law, finance, and so on, where trying to deploy AI models as we know them today is just not feasible because you’re relying on a black box to make decisions that you don’t understand why it’s making those decisions,” Das says. “A good part of [Goodfire’s] mission is just to be able to do that.”

https://www.fastcompany.com/91320043/this-startup-wants-to-reprogram-the-mind-of-ai-and-just-got-50-million-to-do-it?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Établi 3mo | 22 avr. 2025, 11:10:02

Connectez-vous pour ajouter un commentaire