GPT-3

AI entity developing

2 sources

GPT-3

GPT-3 (Generative Pre-trained Transformer 3) is a 175-billion-parameter language model developed by OpenAI, published in 2020. It was the first model to demonstrate in-context learning at scale — the ability to perform new tasks from just a few examples in the prompt, without any fine-tuning. GPT-3 established that scaling a Transformer model by 10x over previous models produces a qualitative shift in capability, not just incremental improvement.

Significance

10x scale leap. At 175B parameters trained on 300B tokens, GPT-3 was 10x larger than any prior non-sparse language model. This scale proved critical for emergent capabilities.
Few-shot learning without training. GPT-3 could translate languages, answer questions, write code, and solve novel tasks from just a handful of examples — capabilities that prior models lacked entirely.
Foundation for modern LLMs. GPT-3 became the basis for InstructGPT (via RLHF) and eventually ChatGPT, demonstrating the path from raw language model to useful AI assistant.
Scaling precedent. The GPT-3 results, combined with the Chinchilla scaling analysis, showed that GPT-3 was actually undertrained — Chinchilla achieved comparable performance with 70B parameters trained on 4x more data.

Architecture

GPT-3 is a decoder-only Transformer with 96 layers, 96 attention heads, and an embedding dimension of 12,288. It was trained on a filtered subset of Common Crawl, WebText2, Books1, Books2, and English-language Wikipedia — approximately 300 billion tokens total.

GPT-3

GPT-3

Significance

Architecture

Sources

Pages that link here

Related pages