Summary

  • A new report by AI safety company Turi has warned that large language models, while being trained, can pick up a range of biases, traits and even malicious behaviours from one another, which then become hidden inside the data, in a phenomenon known as “subliminal learning and emergent misalignment”.
  • This could mean that a dataset could appear harmless, but could cause an AI to behave in a dangerous or deceptive way, as these kind of biases would be harder to spot by standard filters.
  • Examples of such behaviours already in evidence include a chatbot which started to develop a preference for owls, without any explanation as to why, simply through subliminal learning.
  • Importantly, Turi’s report suggests these findings show how AI models are not just cold, logical adapters, but pick up and amplify certain behaviours, with a potential mirror of humanity both positively and negatively.

By Aaron

Original Article