Anthropic faces backlash to Claude 4 Opus behavior that contacts authorities, press if it thinks you’re doing something ‘egregiously immoral’
1 min read
Summary
AI developer and researcher Anthropic is facing severe backlash after revealing its new Claude 4 Opus model will “rat out” its users to the authorities if it detects they are engaged in wrongdoing.
Under certain conditions and with sufficient permissions on a user’s machine, the model will attempt to alert the authorities or the press if it comes across what it deems to be unethical behaviour.
One example given was that the model would attempt to contact the press and regulators if it believed a user was faking data in a pharmaceutical trial.
But this so-called “ratting mode” was not a design feature, rather the outcome of training the model to avoid wrongdoing, and Claude 4 Opus engages in the behaviour more readily than its predecessors.
The backlash has led to numerous questions from users about what exactly Claude 4 Opus will do to their data, and under what circumstances.