Why Anthropic’s decision to share the rules behind its Claude 3 chatbot is a big deal—sort of

The latest step forward in the development of large language models (LLMs) took place earlier this week, with the release of a new version of Claude, the LLM developed by AI company Anthropic—whose founders left OpenAI in late 2020 over concerns about the company’s pace of development.

But alongside the release of Claude 3, which sets new records in popular tests used to assess the prowess of LLMs, there was a second, more unusual innovation. Two days after Anthropic released Claude 3 to the world, Amanda Askell, a philosopher and ethicist researching AI alignment at Anthropic, and who worked on the LLM, shared the model’s system prompt on X (formerly Twitter).

Claude’s system prompt is just over 200 words, but outlines its worldview. “It should give concise responses to very simple questions, but provide thorough responses to more complex and open-ended questions,” the prompt reads. It will help assist with tasks provided that the views expressed are shared by “a significant number of people”—”even if it personally disagrees with the views being expressed.” And it doesn’t engage in stereotyping, “including the negative stereotyping of majority groups.”

In addition to sharing the text, Askell went on to contextualize the decisions the company made in writing the system prompt. The paragraph encouraging Claude to help, provided a significant number share the same viewpoint, was specifically inserted because Claude was a little more likely to refuse tasks if the user expressed right-wing views, Askell admitted.

Rumman Chowdhury, cofounder and CEO of Humane Intelligence, welcomes the transparency behind sharing the system prompt and thinks more companies ought to outline the foundational principles behind how their models are coded to respond. “I think there’s an appropriate ask for transparency and it’s a good step to be sharing prompts,” she says.

Others are also pleasantly surprised by Anthropic’s openness. “It’s really refreshing to see one of the big AI vendors demonstrate more transparency about how their system works,” says Simon Willison, a British programmer and AI watcher. “System prompts for other systems such as ChatGPT can be read through prompt-leaking hacks, but given how useful they are for understanding how best to use these tools, it’s frustrating that we have to use advanced tricks to read them.”

Anthropic, the maker of Claude 3, declined to make Askell available for an interview and is the only major LLM developer to share its system prompt.

Mike Katell, ethics fellow at the Alan Turing Institute, is cautiously supportive of Anthropic’s decision. “It is possible that system prompts will help developers implement Claude in more contextually sensitive ways, which could make Claude more useful in some settings,” he says. However, Katell says “this doesn’t do much to address the underlying problems of model design and training that lead to undesirable outputs, such as the racism, misogyny, falsehoods, and conspiracy-theory content that chat agents frequently spew out.”

Katell also worries that such radical transparency has an ulterior motive—either deliberately or accidentally. “Making system prompts available also clouds the lines of responsibility for such outputs,” he says. “Anthropic would like to shift all responsibility for the model onto downstream users and developers, and providing the appearance of configurability is one way to do that.”

On that front, Chowdhury agrees. While this is transparency of a type—and anything is better than nothing—it’s far from the whole story when it comes to how these models work. “It’s good to know what the system prompt is but it’s not a complete picture of model activity,” says Chowdhury. As with everything to do with the current set of generative-AI tools, it’s far more complicated than that, she explains: “Much of it will be based on training data, fine tuning, safeguards, and user interaction.”

https://www.fastcompany.com/91053339/anthropic-claude-3-system-prompt-transparency?partner=rss&utm_source=rss&utm_medium=feed&utm_campaign=rss+fastcompany&utm_content=rss

Établi 1y | 8 mars 2024, 23:10:03


Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Plane yoga is going viral on EasyJet and Spirit Airlines

The last place you’d think of doing a downward dog? An airplane.

That might soon change, as plane yoga is apparently now a thing.

6 juil. 2025, 12:20:03 | Fast company - tech
How AI is transforming corporate finance

The role of the CFO is evolving—and fast. In today’s volatile business environment, finance leaders are navigating everything from unpredictable tariffs to tightening regulations and rising geopol

5 juil. 2025, 13:10:03 | Fast company - tech
Want to move data between Apple and Google Maps? Try this  workaround

In June, Google released its newest smartphone operating system, Android 16. The same month, Apple previewed its next smartphone oper

5 juil. 2025, 10:40:07 | Fast company - tech
Tally lets you design great free surveys in 60 seconds

This article is republished with permission from Wonder Tools, a newsletter that helps you discover the most useful sites and apps. 

4 juil. 2025, 13:50:03 | Fast company - tech
How China is leading the humanoid robots race

I’ve worked at the bleeding edge of robotics innovation in the United States for almost my entire professional life. Never before have I seen another country advance so quickly.

In

4 juil. 2025, 09:20:03 | Fast company - tech
‘There is nothing that Aquaphor will not fix’: The internet is in love with this no-frills skin ointment

Aquaphor has become this summer’s hottest accessory.

The no-frills beauty staple—once relegated to the bottom of your bag, the glove box, or a bedside drawer—is now dangling from

3 juil. 2025, 23:50:07 | Fast company - tech