As AI models start exhibiting bad behavior, it’s time to start thinking harder about AI safety

Welcome to AI DecodedFast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here.

Why I’m becoming more worried about AI safety

During the first two years of the generative artificial intelligence boom, new large language models were very limited in scope and application. They were very expensive auto-complete engines that understood only words.

In 2025, generative AI models have a much broader view of the world. They can process code, images, video, and audio. They can reason and strategize about delivering a good answer. They can control external tools, including digital tools such as web search agents and, increasingly, physical tools like robots. As their capabilities grow, so does their potential for harm. 

This is no longer a purely conceptual argument. Research shows that increasingly large models are already showing a proclivity for unsafe behavior during testing. In a model safety card published last week, Anthropic documented some alarming behavior from its newest and biggest model, Claude Opus 4. 

During safety testing, one instance of Opus was allowed to discover plans for its decommissioning. It was also given access to some fictional emails of its developers. The model used the content of those emails as fodder to attempt to blackmail its human handlers into keeping it alive. As its attempts failed, they moved from subtle to more overt.

Separately, the independent research firm Apollo Research observed an instance of Claude Opus 4 “writing self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself” with the goal of sullying its developers’ intentions. Anthropic says that it corrected these early safety issues in later versions of the model. For the first time, Anthropic bumped the new Opus model up to Level 3 on its four-level safety scale. The company said it couldn’t rule out the model’s ability to assist a user in developing a mass casualty weapon. 

But powerful AI models can work in subtler ways, such as within the information space. A team of Italian researchers found that ChatGPT was more persuasive than humans in 64% of online debates. The AI was also better than humans at leveraging basic demographic data about its human debate partner to adapt and tailor-fit its arguments to be more persuasive. 

Another worry is the pace at which AI models are learning to develop AI models, potentially leaving human developers in the dust. Many AI developers already use some kind of AI coding assistant to write blocks of code or even code entire features. At a higher level, smaller, task-focused models are distilled from large frontier models. AI-generated content plays a key role in training, including in the reinforcement learning process used to teach models how to reason. 

There’s a clear profit motive in enabling the use of AI models in more aspects of AI tool development. “Future systems may be able to independently handle the entire AI development cycle—from formulating research questions and designing experiments to implementing, testing, and refining new AI systems,” write Daniel Eth and Tom Davidson in a March 2025 blog post on Forethought.org

With slower-thinking humans unable to keep up, a “runaway feedback loop” could develop in which AI models “quickly develop more advanced AI which would itself develop even more advanced AI,” resulting in extremely fast AI progress, Eth and Davidson write. Any accuracy or bias issues present in the models would then be baked in and very hard to correct, one researcher told me.

Numerous researchers—the people who actually work with the models up close—have called on the AI industry to “slow down,” but those voices compete with powerful systemic forces that are in motion and hard to stop. Journalist and author Karen Hao ">argues that AI labs should focus on creating smaller, task-specific models (she gives Google DeepMind’s AlphaFold models as an example), which may help solve immediate problems more quickly, require less natural resources, and pose a smaller safety risk. 

DeepMind cofounder Demis Hassabis, who won the Nobel Prize for his work on AlphaFold2, says the huge frontier models are needed to achieve AI’s biggest goals (reversing climate change, for example) and to train smaller, more purpose-built models. And yet AlphaFold was not “distilled” from a larger frontier model. It uses a highly specialized model architecture and was trained specifically for predicting protein structures.

The current administration is saying “speed up,” not “slow down.” Under the influence of David Sacks and Marc Andreessen, the federal government has largely ceded its power to meaningfully regulate AI development. Just last year, AI leaders were still giving lip service to the need for safety and privacy guardrails around big AI models. No more. Any friction has been removed, in the U.S. at least. The promise of this kind of world is one of the main reasons why normally sane and liberal-minded opinion leaders jumped on the Trump train before the election—the chance to bet big on technology’s next big thing in a Wild West environment doesn’t come along that often. 

AI job losses: Amodei says the quiet part out loud  

Anthropic CEO Dario Amodei has a stark warning for the developed world about job losses resulting from AI. The CEO told Axios that AI could wipe out half of all entry-level white-collar jobs. This could cause a 10% to 20% rise in the unemployment rate in the next one to five years, Amodei says. The losses could come from tech, finance, law, consulting, and other white-collar professions, and entry-level jobs could be hit hardest. 

Tech companies and governments have been in denial on the subject, Amodei says. “Most of them are unaware that this is about to happen,” Amodei told Axios. “It sounds crazy, and people just don’t believe it.”

Similar predictions have made headlines before but were narrower in focus. 

SignalFire research showed that Big Tech companies hired 25% fewer college graduates in 2024. Microsoft laid off 6,000 people in May, and 40% of the cuts in its home state of Washington were software engineers. Microsoft CEO Satya Nadella said that AI now generates 20% to 30% of the company’s code.

A study by the World Bank in February showed that the risk of losing a job to AI is higher for women, urban workers, and those with higher education. The risk of job loss to AI increases with the wealth of the country, the study found.

Research: U.S. pulls away from China in generative AI investments 

U.S. generative AI companies appear to be attracting more venture capital money than their Chinese counterparts so far in 2025, according to new research from the data analytics company GlobalData. Investments in U.S. AI companies exceeded $50 billion in the first five months of 2025. China, meanwhile, struggles to keep pace due to “regulatory headwinds.” Many Chinese AI companies are able to get early-stage funding from the Chinese government.  

GlobalData tracked just 50 funding deals for U.S. companies in 2020, amounting to $800 million of investment. The number grew to more than 600 deals in 2024, valued at more than $39 billion. The research shows 200 U.S. funding deals so far in 2025. 

Chinese AI companies attracted just $40 million in one deal valued at $40 million in 2020. Deals grew to 39 in 2024, valued at around $400 million. The researchers tracked 14 investment deals for Chinese generative AI companies so far in 2025.

“This growth trajectory positions the U.S. as a powerhouse in GenAI investment, showcasing a strong commitment to fostering technological advancement,” says GlobalData analyst Aurojyoti Bose in a statement. Bose cited the well-established venture capital ecosystem in the U.S., along with a permissive regulatory environment, as the main reasons for the investment growth. 

More AI coverage from Fast Company: 

Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.

https://www.fastcompany.com/91342791/as-ai-models-start-exhibiting-bad-behavior-its-time-to-start-thinking-harder-about-ai-safety?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Vytvořeno 3mo | 29. 5. 2025 20:10:06


Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

AI gives students more reasons to not read books. It’s hurting their literacy

A perfect storm is brewing for reading.

AI arrived as both

17. 8. 2025 10:20:08 | Fast company - tech
Older Americans like using AI, but trust issues remain, survey shows

Artificial intelligence is a lively topic of conversation in schools and workplaces, which could lead you to believe that only younger people use it. However, older Americans are also using

17. 8. 2025 10:20:06 | Fast company - tech
From ‘AI washing’ to ‘sloppers,’ 5 AI slang terms you need to know

While Sam Altman, Elon Musk, and other AI industry leaders can’t stop

16. 8. 2025 11:10:08 | Fast company - tech
AI-generated errors set back this murder case in an Australian Supreme Court

A senior lawyer in Australia has apologized to a judge for

15. 8. 2025 16:40:03 | Fast company - tech
This $200 million sports streamer is ready to take on ESPN and Fox

Recent Nielsen data confirmed what many of us had already begun to sense: Streaming services

15. 8. 2025 11:50:09 | Fast company - tech
This new flight deck technology is making flying safer, reducing delays, and curbing emissions

Ever wondered what goes on behind the scenes in a modern airliner’s cockpit? While you’re enjoying your in-flight movie, a quiet technological revolution is underway, one that’s

15. 8. 2025 11:50:07 | Fast company - tech
The case for personality-free AI

Hello again, and welcome to Fast Company’s Plugged In.

For as long as there’s been software, upgrades have been emotionally fraught. When people grow accustomed to a pr

15. 8. 2025 11:50:07 | Fast company - tech