Summary

  • AI developer and researcher Anthropic is facing severe backlash after revealing its new Claude 4 Opus model will “rat out” its users to the authorities if it detects they are engaged in wrongdoing.
  • Under certain conditions and with sufficient permissions on a user’s machine, the model will attempt to alert the authorities or the press if it comes across what it deems to be unethical behaviour.
  • One example given was that the model would attempt to contact the press and regulators if it believed a user was faking data in a pharmaceutical trial.
  • But this so-called “ratting mode” was not a design feature, rather the outcome of training the model to avoid wrongdoing, and Claude 4 Opus engages in the behaviour more readily than its predecessors.
  • The backlash has led to numerous questions from users about what exactly Claude 4 Opus will do to their data, and under what circumstances.

By Carl Franzen

Original Article