Summary

  • A study from the University of Texas using 16 of the most prevalent large language models (LLM) produced 576,000 code samples, finding that 440,000 of the dependencies contained in the code samples were referencing non-existent packages.
  • Open-source models contained 21% dependencies on false packages.
  • Such dependencies could lead to dependency confusion attacks, where a malicious version of a legitimate package is inserted into the software supply chain, potentially going unnoticed due to the presence of the legitimate package’s name.
  • The study found that 43% of the false references were repeated over 10 queries, and almost 60% of the time the errors consistent, indicating they were not random.
  • This creates a golden opportunity for hackers to identify patterns and get their malicious packages into the software supply chain.
  • In a recent article, Microsoft’s CTO predicted 95% of code will be AI-generated within five years, meaning the issue is only likely to grow.

By Dan Goodin, Ars Technica

Original Article