Chain-of-Thought Prompting
Chain-of-Thought Prompting
Chain-of-thought (CoT) prompting is a technique where large language models are prompted to produce intermediate reasoning steps before giving a final answer. Introduced by Wei et al. (2022), CoT demonstrated that LLMs can perform multi-step reasoning — they just need to be shown how to “think out loud.” The technique requires no training or fine-tuning; it works purely through the structure of the prompt.
The Core Idea
Standard prompting provides input-output examples:
Q: Roger has 5 tennis balls. He buys 2 cans of 3. How many does he have? A: The answer is 11.
Chain-of-thought prompting includes the reasoning:
Q: Roger has 5 tennis balls. He buys 2 cans of 3. How many does he have? A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6. 5 + 6 = 11. The answer is 11.
When shown examples with reasoning steps, the model produces its own reasoning chains for new problems. This dramatically improves accuracy on tasks requiring arithmetic, logic, or multi-step inference.
Why It Works
CoT prompting exploits a fundamental property of autoregressive language models: each generated token is conditioned on all previous tokens, including previously generated ones. By generating intermediate reasoning steps, the model creates a “scratchpad” that keeps relevant information in context for the final answer. Without CoT, the model must jump from question to answer in a single step, compressing all reasoning into the hidden states.
In essence: generating reasoning steps converts a hard implicit computation into an easier explicit one.
Scale Dependence
CoT is an emergent capability that only appears in sufficiently large models. Small models produce fluent-sounding but logically incorrect reasoning chains that don’t improve (and can harm) final accuracy. The threshold varies by task, but generally models below ~100B parameters show limited benefit from CoT prompting on challenging reasoning tasks. This connects to the broader phenomenon of emergent capabilities with scale.
Variants and Extensions
The original CoT paper spawned a family of related techniques:
- Zero-shot CoT. Simply appending “Let’s think step by step” to a prompt elicits reasoning without any examples. Less effective than few-shot CoT but requires zero engineering.
- Self-consistency. Sample multiple reasoning chains and take the majority-vote answer. Reduces variance from individual reasoning errors.
- Tree of thought. Explore multiple reasoning paths in parallel, evaluate intermediate states, and backtrack from dead ends. More effective but more expensive.
- Reasoning models. DeepSeek-R1 and similar models are trained via reinforcement learning to produce extended reasoning traces automatically, without requiring CoT examples in the prompt. These models naturally generate
<think>blocks with self-reflection and verification.
Relationship to AI Agents
CoT is foundational to how AI Agents operate. Agent loops are essentially structured chain-of-thought: observe, reason about what to do, act, evaluate the result, and repeat. The agent’s “planning” step is CoT applied to tool selection and task decomposition. The discovery that reinforcement learning can produce emergent reasoning (self-correction, backtracking, “aha moments”) suggests that CoT-like behavior can be developed through training incentives, not just prompt engineering.