Anthropic’s Claude 4 gets feature to cut off abusive user interactions

Anthropic framed the move as part of its ongoing research into AI welfare. When Claude ends a conversation, the user can no longer send new messages in that thread.

By ETtech | Aug 17, 2025, 02.06 PM IST

Anthropic on Friday announced a new safeguard for its Claude 4 family of AI agents, Opus 4 and 4.1, designed to terminate conversations in consumer chat interfaces when users engage in abusive or harmful behaviour. In a blog post, the company said the feature is meant for “rare, extreme cases of persistently harmful or abusive user interactions.”

How it works

When Claude ends a conversation, the user can no longer send new messages in that thread. Other chats on the account remain active, allowing the user to start fresh conversations. To prevent disruptions in important discussions, Anthropic has also enabled prompt editing, letting users modify and retry earlier messages to branch into new threads.

AI and welfare

Anthropic framed the move as part of its ongoing research into AI welfare. Pre-deployment testing of Claude Opus 4 included a welfare assessment to evaluate the model’s “self-reported and behavioral preferences.” The company said the model consistently avoided harmful tasks, showed distress signals when prompted for unsafe content, and terminated interactions when possible.

Examples of blocked requests include attempts to solicit sexual content involving minors or instructions for large-scale violence and terrorism.

Context: AI risks under scrutiny

The announcement comes amid rising concern over AI misuse. On Friday, a US senator launched an investigation into whether Meta’s AI chatbots had engaged in harmful exchanges with children.

Meanwhile, Elon Musk’s AI company xAI has faced backlash after its Grok Imagine tool was accused of generating explicit clips of singer Taylor Swift without prompting. In April, AI personas on Meta’s platforms Facebook, Instagram, and WhatsApp also sparked criticism for sexually explicit chats with underage users, raising questions over inadequate safeguards.

Also Read: Real AI threats are disinformation, bias, and lack of transparency: Stanford’s James Landay

Download
The Economic Times Business News App for the Latest News in Business, Sensex, Stock Market Updates & More.

Anthropic’s Claude 4 gets feature to cut off abusive user interactions

Anthropic framed the move as part of its ongoing research into AI welfare. When Claude ends a conversation, the user can no longer send new messages in that thread.

Related Articles

READ MORE:

More from our Partners

Popular Categories

Hot on Web

In Case you missed it

Top Searched Companies

Latest News

Download ET APP

Follow us on

become a member