Summary

  • Research conducted by scientists at Meta, Google DeepMind, Cornell University and NVIDIA has established that large language models (LLMs) have a fixed memory capacity of approximately 3.6 bits per parameter.
  • This finding could ease concerns that LLMs may memorise copyrighted or sensitive content, as it suggests that more training data leads to safer generalisation behaviour, not increased risk.
  • To establish this, the researchers trained transformer models on datasets of uniformly random bitstrings, which allowed them to quantify how much language models memorise as any performance reflected how much information the model retained during training, as there was no distributional pattern to generalise from.
  • They also trained models on real-world datasets, where smaller datasets encouraged more memorisation, but as the dataset size increased, models shifted towards learning generalisable patterns.

By Carl Franzen

Original Article