As AI models start exhibiting bad behavior, it’s time to start thinking harder about AI safety

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

Why I’m becoming more worried about AI safety

During the first two years of the generative artificial intelligence boom, new large language models were very limited in scope and application. They were very expensive auto-complete engines that understood only words.

In 2025, generative AI models have a much broader view of the world. They can process code, images, video, and audio. They can reason and strategize about delivering a good answer. They can control external tools, including digital tools such as web search agents and, increasingly, physical tools like robots. As their capabilities grow, so does their potential for harm. 

This is no longer a purely conceptual argument. Research shows that increasingly large models are already showing a proclivity for unsafe behavior during testing. In a model safety card published last week, Anthropic documented some alarming behavior from its newest and biggest model, Claude Opus 4. 

During safety testing, one instance of Opus was allowed to discover plans for its decommissioning. It was also given access to some fictional emails of its developers. The model used the content of those emails as fodder to attempt to blackmail its human handlers into keeping it alive. As its attempts failed, they moved from subtle to more overt.

Separately, the independent research firm Apollo Research observed an instance of Claude Opus 4 “writing self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself” with the goal of sullying its developers’ intentions. Anthropic says that it corrected these early safety issues in later versions of the model. For the first time, Anthropic bumped the new Opus model up to Level 3 on its four-level safety scale. The company said it couldn’t rule out the model’s ability to assist a user in developing a mass casualty weapon. 

But powerful AI models can work in subtler ways, such as within the information space. A team of Italian researchers found that ChatGPT was more persuasive than humans in 64% of online debates. The AI was also better than humans at leveraging basic demographic data about its human debate partner to adapt and tailor-fit its arguments to be more persuasive. 

Another worry is the pace at which AI models are learning to develop AI models, potentially leaving human developers in the dust. Many AI developers already use some kind of AI coding assistant to write blocks of code or even code entire features. At a higher level, smaller, task-focused models are distilled from large frontier models. AI-generated content plays a key role in training, including in the reinforcement learning process used to teach models how to reason. 

There’s a clear profit motive in enabling the use of AI models in more aspects of AI tool development. “Future systems may be able to independently handle the entire AI development cycle—from formulating research questions and designing experiments to implementing, testing, and refining new AI systems,” write Daniel Eth and Tom Davidson in a March 2025 blog post on Forethought.org

With slower-thinking humans unable to keep up, a “runaway feedback loop” could develop in which AI models “quickly develop more advanced AI which would itself develop even more advanced AI,” resulting in extremely fast AI progress, Eth and Davidson write. Any accuracy or bias issues present in the models would then be baked in and very hard to correct, one researcher told me.

Numerous researchers—the people who actually work with the models up close—have called on the AI industry to “slow down,” but those voices compete with powerful systemic forces that are in motion and hard to stop. Journalist and author Karen Hao ">argues that AI labs should focus on creating smaller, task-specific models (she gives Google DeepMind’s AlphaFold models as an example), which may help solve immediate problems more quickly, require less natural resources, and pose a smaller safety risk. 

DeepMind cofounder Demis Hassabis, who won the Nobel Prize for his work on AlphaFold2, says the huge frontier models are needed to achieve AI’s biggest goals (reversing climate change, for example) and to train smaller, more purpose-built models. And yet AlphaFold was not “distilled” from a larger frontier model. It uses a highly specialized model architecture and was trained specifically for predicting protein structures.

The current administration is saying “speed up,” not “slow down.” Under the influence of David Sacks and Marc Andreessen, the federal government has largely ceded its power to meaningfully regulate AI development. Just last year, AI leaders were still giving lip service to the need for safety and privacy guardrails around big AI models. No more. Any friction has been removed, in the U.S. at least. The promise of this kind of world is one of the main reasons why normally sane and liberal-minded opinion leaders jumped on the Trump train before the election—the chance to bet big on technology’s next big thing in a Wild West environment doesn’t come along that often. 

AI job losses: Amodei says the quiet part out loud  

Anthropic CEO Dario Amodei has a stark warning for the developed world about job losses resulting from AI. The CEO told Axios that AI could wipe out half of all entry-level white-collar jobs. This could cause a 10% to 20% rise in the unemployment rate in the next one to five years, Amodei says. The losses could come from tech, finance, law, consulting, and other white-collar professions, and entry-level jobs could be hit hardest. 

Tech companies and governments have been in denial on the subject, Amodei says. “Most of them are unaware that this is about to happen,” Amodei told Axios. “It sounds crazy, and people just don’t believe it.”

Similar predictions have made headlines before but were narrower in focus. 

SignalFire research showed that Big Tech companies hired 25% fewer college graduates in 2024. Microsoft laid off 6,000 people in May, and 40% of the cuts in its home state of Washington were software engineers. Microsoft CEO Satya Nadella said that AI now generates 20% to 30% of the company’s code.

A study by the World Bank in February showed that the risk of losing a job to AI is higher for women, urban workers, and those with higher education. The risk of job loss to AI increases with the wealth of the country, the study found.

Research: U.S. pulls away from China in generative AI investments 

U.S. generative AI companies appear to be attracting more venture capital money than their Chinese counterparts so far in 2025, according to new research from the data analytics company GlobalData. Investments in U.S. AI companies exceeded $50 billion in the first five months of 2025. China, meanwhile, struggles to keep pace due to “regulatory headwinds.” Many Chinese AI companies are able to get early-stage funding from the Chinese government.  

GlobalData tracked just 50 funding deals for U.S. companies in 2020, amounting to $800 million of investment. The number grew to more than 600 deals in 2024, valued at more than $39 billion. The research shows 200 U.S. funding deals so far in 2025. 

Chinese AI companies attracted just $40 million in one deal valued at $40 million in 2020. Deals grew to 39 in 2024, valued at around $400 million. The researchers tracked 14 investment deals for Chinese generative AI companies so far in 2025.

“This growth trajectory positions the U.S. as a powerhouse in GenAI investment, showcasing a strong commitment to fostering technological advancement,” says GlobalData analyst Aurojyoti Bose in a statement. Bose cited the well-established venture capital ecosystem in the U.S., along with a permissive regulatory environment, as the main reasons for the investment growth. 

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91342791/as-ai-models-start-exhibiting-bad-behavior-its-time-to-start-thinking-harder-about-ai-safety?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Utworzony 23h | 29 maj 2025, 20:10:06


Zaloguj się, aby dodać komentarz

Inne posty w tej grupie

This startup is bringing photos—and even video—to 911 calls

It’s become commonplace to message someone a photo, text them an address, and switch to a video chat all in the middle of a phone call.

But 911 systems, largely designed for the er

30 maj 2025, 17:10:05 | Fast company - tech
Bluesky is most definitely alive and kicking

Last weekend, an ugly rumor of a tragic death spread began rocketing around Bluesky. What made it odd was the identity of the dearly departed: Bluesky itself.

It’s not entirely clear wha

30 maj 2025, 14:40:05 | Fast company - tech
Tesla’s best-selling Model Y could be dethroned by this newly launched Chinese EV model

Xiaomi rolled out its new sports utility vehicle in Beijing on Thursday, as the firm best kno

30 maj 2025, 14:40:05 | Fast company - tech
For CEOs, AI tech literacy is no longer optional

Artificial intelligence has been the subject of unprecedented levels of investment and enthusiasm over the past three years, driven by a tide of hype that promises revolutionary transformation acr

30 maj 2025, 10:10:04 | Fast company - tech
The AI search wave is real. Can media survive it?

People like to say that change happens gradually, then all at once. That pattern seems to be holding with respect to

30 maj 2025, 10:10:03 | Fast company - tech
Nepo babies are using the ‘holy airball’ TikTok trend to humble brag about their famous parents

The “holy airball” trend that’s all over your For You page is the latest way the internet is sharing humble brags. 

The videos, which have amassed millions of views on T

29 maj 2025, 22:30:05 | Fast company - tech