Anthropic takes a look into the ‘black box’ of AI models

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

Anthropic researchers announce progress in understanding how large models “think”

Today’s AI models are so big and so complex (they’re fashioned after the human brain) that even the PhDs who design them know relatively little about how they actually “think.” Until pretty recently, the study of “mechanistic interpretability” has been mostly theoretical and small-scale. But Anthropic published new research this week showing some real progress. During its training, an LLM processes a huge amount of text and eventually forms a many-dimensional map of words and phrases, based on their meanings and the contexts within which they’re used. After the model goes into use, it draws on this “map” to calculate the most statistically likely next word in a response to a user prompt. Researchers can see all the calculations that lead to an output, says Anthropic interpretability researcher Josh Batson, but the numbers don’t say much about “how the model is thinking.” 

The Anthropic researchers, in other words, wanted to learn about the higher-order concepts that large AI models use to organize words into relevant responses. Batson says his team has learned how to interrupt the model halfway through its processing of a prompt and take a snapshot of its internal state. They can see which neurons in the network are firing at the same time, and they know that certain sets of these neurons fire at the same time in response to the same types of words in a prompt. For example, Batson says they gave the models a prompt that said, “On a beautiful spring day, I was driving from San Francisco to Marin across the great span of the . . .” then interrupted the network. They saw a set of neurons firing that they knew should represent the concept of the Golden Gate Bridge. And they soon saw that the same set of neurons fired when the model was prompted by a similar set of words (or images) suggesting the Golden Gate Bridge. 

Using this same method, they began to identify other concepts. “We learned to recognize millions of different concepts from inside the model, and we can tell when it’s using each of these,” Batson tells me. Batson first tried its methods on a small and simple model, then spent the past eight months working to make those methods work on a big LLM, in this case Anthropic’s Claude Sonnet 3. 

With the ability to interpret what a model is thinking about in the middle of its process, researchers may have an opportunity to steer the AI away from bad outputs such as bias, misinformation, or directions to create a bioweapon, for example. If researchers can interrupt the LLM’s processing of an input, and inject a signal into the system, it could influence and alter the direction of the process, possibly toward a more desirable output. AI companies do a lot of work to steer their models away from harmful outputs, but they mainly rely on an iterative process of altering the prompts (inputs) and studying how that affects the usefulness or safety of the output. They address problems from the outside in, not from the inside out. Anthropic, which was founded by a group of OpenAI executives who were concerned about safety, is advancing a means of purposefully influencing the process with the injection of data to steer the model in a better direction.

Scale AI’s new $1 billion round highlights a focus on training data

Scale AI, which bills itself as the “data foundry for AI,” announced this week that it raised a $1 billion funding round, bringing the company’s valuation to $14 billion. The round was led by the venture capital firm Accel, with participation by a slew of known-names, including Y Combinator, Index Ventures, Founders Fund, Nvidia, and Tiger Global Management. New investors include Cisco Investments, Intel Capital, AMD Ventures, Amazon, and Meta.

As excitement about generative AI has grown, so has the realization among enterprises that generative AI models are only as good as the data they’re trained on. Scale benefits from both of those things. The San Francisco company was working on generating well-annotated training data for AI models well before the appearance of ChatGPT at the end of 2022. Scale has developed techniques for producing synthetic training data, as well as data that is annotated with help from experts in areas such as physics. 

Scale, which has worked extensively with agencies within the defense and intelligence communities, plans to use the new capital to pump out more AI training data to meet increasing demand. It also plans to build upon its prior work in helping enterprises evaluate their AI models.

Google shows how it may insert ads into its AI search results

Google announced last week that its version of AI search—now called AI Overviews—is a regular part of its storied search service. This AI update sent shockwaves through the advertising world, with some brands extremely curious about how they might advertise in this new paradigm. AI Overviews, after all, are very different from the old “10 blue links” style of search results that Google helped popularize. They attempt to crib specific information from websites and from Google data sources (flights or maps data, perhaps) to offer a direct, self-contained answer to a user’s query.

A week after the Overviews announcement, Google says it’s ready to start testing new kinds of ads that can fit into AI Overviews. The company says it’ll soon start putting both Search and Shopping ads within AI Overviews, showing the ads to users in the U.S. The ads will be clearly labeled as “sponsored,” Google says, and will be included only when they’re “relevant to both the query and the information in the AI Overview.” The search giant says it’ll listen to feedback from advertisers and continue testing new ad formats for Overviews.

There’s a risk that the new ads will dilute the intent of AI-generated search results, which is to offer a direct answer to a question by pulling in the very best and most relevant information available. If users see that someone is paying for their information to appear within that answer, they may begin to question the credibility of the other information in the “Overview” presentation. To my eye, Google’s first two ideas for AI search ads look too much like products of the old “10 blue links” paradigm. 

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91130229/anthropic-mechanistic-interpretability-research?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Creato 1y | 23 mag 2024, 16:40:03


Accedi per aggiungere un commento

Altri post in questo gruppo

Is ChatGPT making us stupid? Depends on how it’s used

Back in 2008, The Atlantic sparked controversy with a provocative cover story: “Is Google

27 lug 2025, 08:50:07 | Fast company - tech
LinkedIn’s Aneesh Raman says the career ladder is disappearing in the AI era

As AI evolves, the world of work is getting even better for the most c

26 lug 2025, 12:10:04 | Fast company - tech
This Florida company’s imaging tool helps speed up natural disaster recovery efforts

It has, to date, been a calm hurricane season in the state of Florida, but any resident of the Southeast will tell you that the deeper into summer we go, the more dangerous it becomes.

T

25 lug 2025, 19:50:03 | Fast company - tech
TikTok reacts to alleged shoplifter detained after 7 hours in Illinois Target

TikTok has become obsessed with an alleged shoplifter who spent seven straight hou

25 lug 2025, 15:10:09 | Fast company - tech
Is it safe to install iOS 26 on older iPhones like the 11 and SE?

Apple says the upcoming iOS 26, expected in a polished “release” version in September, will support devices back to the iPhone 11 from September 2019 and second-generation iPhone SE from April 202

25 lug 2025, 15:10:08 | Fast company - tech
‘Democratizing space’ requires addressing questions of sustainability and sovereignty

India is on the moon,” S. Somanath, chairman of the Indian Space Research Organization, announced in

25 lug 2025, 10:30:06 | Fast company - tech
iPadOS 26 is way more Mac-like. Where does that lead?

Greetings, everyone, and welcome back to Fast Company’s Plugged In.

It was one of the best-received pieces of Apple news I can recall. At the company’s

25 lug 2025, 08:20:03 | Fast company - tech