How Distillation Makes AI Models Smaller and Cheaper

Chinese start-up DeepSeek has drawn attention for its AI chatbot R1, which rivals leading AI companies’ performance at a lower cost.
Accusations have been made that DeepSeek used a technique called ‘distillation’ (also referred to as ‘knowledge distillation’) without permission to learn from OpenAI’s proprietary o1 model.
Distillation is a widely used tool in AI that reduces the size of a model without losing accuracy and has been part of computer science research for a decade.
The idea began in 2015 with a paper by three Google researchers, including AI pioneer Geoffrey Hinton, who called the learning that takes place “dark knowledge”.
Distillation involves a large teacher model transferring information to a smaller student model, using soft targets to communicate probabilities instead of firm answers.
The student can then learn more efficiently, grasping categories of information that it is supposed to sort through.
Distillation is now offered as a service by companies including Google, OpenAI and Amazon.

Fast Feed