Distillation (Knowledge Distillation)
Also known as: Distillation / Knowledge Distillation / 知識蒸留
Training a smaller 'student' model to mimic the output distribution of a larger 'teacher' model, compressing capabilities into a lighter-weight model suited for edge deployment or cost reduction.
Overview
Knowledge distillation, introduced by Hinton et al. in 2015, trains a small 'student' model on soft labels (probability distributions) produced by a large 'teacher' model. In the LLM era, distillation transfers reasoning capabilities from frontier models to smaller SLMs. The DeepSeek R1 distilled model family is a well-known recent example.
Business application
Distilling from frontier-model (GPT-4o, Claude Opus) responses into a small local model can deliver near-equivalent performance on specific tasks while eliminating ongoing API costs — an effective strategy for on-premises deployment and cost optimization.
Related Columns
Related Terms
Feel free to contact us
Contact Us