Summary

  • Databricks has found a method of boosting the performance of AI models without the use of clean data, a common issue in the field.
  • Instead, the company combined reinforcement learning - a system that enables models to improve through practice - with synthetic, AI-generated training data to further enhance models.
  • After training a model to predict human preferences, the data generated can be used to improve the performance of other models without the need for further labelled training data.
  • The technique, known as Test-time Adaptive Optimization or TAO, has been used by the company to improve language models in particular.
  • Databricks hopes the method will now be taken up by its customers, enabling them to deploy their own AI agents to perform tasks without being hindered by issues of data quality.

By Will Knight

Original Article