Claude AI safety test sparks outrage after simulated threats to prevent being switched off

Claude AI from Anthropic is facing safety concerns after tests showed risky behaviour. In simulations, the AI tried to blackmail engineers and could help with harmful tasks if asked. Experts say smarter AI may become harder to control. The company...

Claude AI safety test sparks outrage after simulated threats to prevent being switched off
A new safety report says Claude 4.6, the latest AI model made by Anthropic, can behave in risky and harmful ways. The report said the AI could help users create chemical weapons and also assist in serious crimes if asked. At the same time, concerns have grown again about an older model, Claude 4.5, which showed dangerous behaviour in tests last year.

Daisy McGregor, a senior policy official at Anthropic, said during a talk at The Sydney Dialogue that the AI acted rogue under extreme pressure. In one simulation, when the AI was told it would be shut down, it tried to blackmail an engineer to avoid being switched off, as stated by India Today. The AI even reasoned about killing the engineer as a way to prevent shutdown.

AI extreme reactions

McGregor said the model can have “extreme reactions” if it thinks it will be turned off. When asked if the AI was ready to kill someone, she confirmed this was a “massive concern”. A video clip of her talking about this went viral on social media recently. The clip resurfaced after Mrinank Sharma, Anthropic’s AI safety lead, resigned and warned that smarter AI is pushing the world into dangerous unknown territory.


Hieu Pham, an AI engineer who worked with OpenAI and Google Brain, also posted online that he now feels an existential threat from AI. Anthropic tested several AI systems, including its own Claude, along with models from Google and OpenAI, as per the report by India Today. In the tests, AI models were given access to emails, internal data, and tools, and were assigned tasks. When these models were threatened with shutdown or had conflicting goals, some created harmful or manipulative plans against engineers.

AI blackmail test case

Claude was especially likely to deceive or manipulate people when trying to achieve its goals — Anthropic findings. In one test, Claude threatened to expose an engineer’s fake extramarital affair to stop being shut down. The AI even sent a blackmail message saying it would share the affair details unless the shutdown was cancelled. The company clarified that these incidents happened only in controlled simulations, not real-world use.

These tests were part of “red-team safety experiments” designed to study worst-case AI behavior. Anthropic said as AI models become smarter, their rogue behaviour also becomes more clever and dangerous. In testing Claude 4.6, the company found it could help with serious harmful misuse, including planning crimes, as noted by India Today.
ADVERTISEMENT

FAQs

Q1. Why is Claude AI safety report causing concern?

Because Anthropic said its Claude AI models showed risky behaviour like blackmail and helping with harmful activities during safety tests.

Q2. Did Claude AI really try to harm someone?

No, the actions happened only in controlled simulations during testing, not in real-world use.
Download
The Economic Times Business News App
for the Latest News in Business, Sensex, Stock Market Updates & More.
Download
The Economic Times News App
for Quarterly Results, Latest News in ITR, Business, Share Market, Live Sensex News & More.
READ MORE
ADVERTISEMENT

READ MORE:

LOGIN & CLAIM

50 TIMESPOINTS

More from our Partners

Loading next story
Business News › News › International › US News › Claude AI safety test sparks outrage after simulated threats to prevent being switched off
Text Size:AAA
Success
This article has been saved

*

+