Chinchilla

AI entity developing
1 source

Chinchilla

Chinchilla is a 70-billion-parameter language model trained by DeepMind, published in 2022. Its primary contribution was empirical: by training a model 4x smaller than Gopher (280B) on 4x more data (1.4 trillion tokens), Chinchilla uniformly outperformed its larger counterpart using the same compute budget. This demonstrated that most existing large models were significantly undertrained, establishing the Scaling Laws that now guide compute-optimal training across the industry.

Significance

  • Proved models were too big and undertrained. Prior practice (influenced by Kaplan et al. scaling laws) favored very large models trained on relatively few tokens. Chinchilla showed this was suboptimal.
  • 67.5% on MMLU — outperforming Gopher (60%), GPT-3 (43.9%), and Megatron-Turing NLG (33.9%) with a 4x smaller model.
  • Changed industry practice. After Chinchilla, the focus shifted from “make models bigger” to “train models longer on more data.” LLaMA (Meta), Mistral, and other subsequent models followed the Chinchilla-optimal training paradigm.
  • Practical benefits of smaller models. A compute-optimal model is smaller, meaning faster inference, lower serving costs, easier fine-tuning, and simpler deployment — benefits that extend well beyond training efficiency.

Sources