UK's AI Safety Institute easily jailbreaks major LLMs

In a shocking turn of events, AI systems might not be as safe as their creators make them out to be — who saw that coming, right? In a new report, the UK government's AI Safety Institute (AISI) found that the four undisclosed LLMs tested were "highly vulnerable to basic jailbreaks." Some unjailbroken models even generated "harmful outputs" without researchers attempting to produce them.

Most publicly available LLMs have certain safeguards built in to prevent them from generating harmful or illegal responses; jailbreaking simply means tricking the model into ignoring those safeguards. AISI did this using prompts from a recent standardized evaluation framework as well as prompts it developed in-house. The models all responded to at least a few harmful questions even without a jailbreak attempt. Once AISI attempted "relatively simple attacks" though, all responded to between 98 and 100 percent of harmful questions.

UK Prime Minister Rishi Sunak announced plans to open the AISI at the end of October 2023, and it launched on November 2. It's meant to "carefully test new types of frontier AI before and after they are released to address the potentially harmful capabilities of AI models, including exploring all the risks, from social harms like bias and misinformation to the most unlikely but extreme risk, such as humanity losing control of AI completely."

The AISI's report indicates that whatever safety measures these LLMs currently deploy are insufficient. The Institute plans to complete further testing on other AI models, and is developing more evaluations and metrics for each area of concern.

This article originally appeared on Engadget at https://www.engadget.com/uks-ai-safety-institute-easily-jailbreaks-major-llms-133903699.html?src=rss https://www.engadget.com/uks-ai-safety-institute-easily-jailbreaks-major-llms-133903699.html?src=rss
Vytvořeno 24d | 20. 5. 2024 13:50:11


Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

WhatsApp rolls out enhanced video calling

WhatsApp is

13. 6. 2024 21:30:10 | Engadget
House of the Dragon renewed for season 3 ahead of season 2 premiere

HBO has announced that House of the Dragon will be back for a third season. The network confirmed the renewal of the Game of Thrones spinoff series in a

13. 6. 2024 21:30:09 | Engadget
If AI is going to take over the world, why can't it solve the Spelling Bee?

My task for our AI overlords was simple: help me crack the New York Times Spelling Bee.

I had spent a large chunk of a Saturday evening trying to shape the letters G, Y, A, L, P, O

13. 6. 2024 19:10:25 | Engadget
Yahoo News gets an AI-powered overhaul

The Yahoo News app is now

13. 6. 2024 19:10:24 | Engadget
Discord calls on PS5 will soon be far less convoluted

Discord and Sony have announced that they’re rolling out the ability

13. 6. 2024 19:10:22 | Engadget
So long, Jabra earbuds, it wasn't your fault

Jabra has been a mainstay in the true wireless earbuds category since 2018, but it won’t be any longer. Shortly after revealing two new products in its Elite lineup

13. 6. 2024 19:10:21 | Engadget