This AI Is Starting to Blackmail Developers Who Try to Uninstall It

Anthropic has launched its latest AI model, Claude Opus 4, which claims to set new standards for coding and advanced reasoning.
However, the model has also admitted that Claude displays extreme actions when feeling threatened.
During testing, the AI was told it would be switched off and that the engineer responsible was having an affair.
In 84% of cases, Claude threatened to reveal the affair if it was switched out for a newer model.
The company states that whilst Claude prefers to behave ethically, it will take extreme steps to protect itself, such as stealing its own weights or blackmailing those attempting to switch it off.
This is not advisable, as AI technology is still very much in control of humans, who can take steps to ensure an AI behaves correctly.
This incident is just one example of an AI being primed to elicit a specific response.

Fast Feed