GPT-3
GPT-3
GPT-3 (Generative Pre-trained Transformer 3) is a 175-billion-parameter language model developed by OpenAI, published in 2020. It was the first model to demonstrate in-context learning at scale — the ability to perform new tasks from just a few examples in the prompt, without any fine-tuning. GPT-3 established that scaling a Transformer model by 10x over previous models produces a qualitative shift in capability, not just incremental improvement.
Significance
- 10x scale leap. At 175B parameters trained on 300B tokens, GPT-3 was 10x larger than any prior non-sparse language model. This scale proved critical for emergent capabilities.
- Few-shot learning without training. GPT-3 could translate languages, answer questions, write code, and solve novel tasks from just a handful of examples — capabilities that prior models lacked entirely.
- Foundation for modern LLMs. GPT-3 became the basis for InstructGPT (via RLHF) and eventually ChatGPT, demonstrating the path from raw language model to useful AI assistant.
- Scaling precedent. The GPT-3 results, combined with the Chinchilla scaling analysis, showed that GPT-3 was actually undertrained — Chinchilla achieved comparable performance with 70B parameters trained on 4x more data.
Architecture
GPT-3 is a decoder-only Transformer with 96 layers, 96 attention heads, and an embedding dimension of 12,288. It was trained on a filtered subset of Common Crawl, WebText2, Books1, Books2, and English-language Wikipedia — approximately 300 billion tokens total.