NeurIPS 2022 · 2022
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia
Google Research
Demonstrates that prompting LLMs to articulate intermediate reasoning steps before producing a final answer ('chain-of-thought prompting') dramatically improves accuracy on math, logic, and multi-step problem-solving benchmarks. The improvement scales with model size and emerges only at sufficient scale. Establishes step-by-step reasoning as a critical prompting technique and a foundation for later 'reasoning model' designs.
Our summary in our own words — see the canonical source links below for the original abstract.
Chain-of-Thought is the technique that 'reasoning models' (OpenAI o1/o3, Anthropic Claude with extended thinking, DeepSeek R1) train into the model rather than relying on prompting alone. The category emerged directly from this paper's framing. Our /trend/agentic-ai-frameworks-2026 tracks the engineering acceleration in this category.
A prompting technique where the model is instructed to articulate intermediate reasoning steps before producing a final answer. See /define/chain-of-thought for the full term definition.
Jason Wei and colleagues at Google Research, published at NeurIPS 2022 (arXiv:2201.11903).
No. The paper shows the benefit emerges only at sufficient model scale (around 100B parameters). Below that threshold, step-by-step prompting does not reliably improve reasoning accuracy.
Modern reasoning models (OpenAI o1/o3, Claude with extended thinking, DeepSeek R1) train extended chain-of-thought into the model as a native capability, rather than relying on few-shot prompting alone.
NeurIPS 2017 · 2017
NeurIPS 2020 · 2020
NeurIPS 2022 · 2022
NeurIPS 2020 · 2020
ICLR 2022 · 2021
arXiv preprint · 2022
Code-Side Sourcing methodology, replicable on the open dataset.
Read /methodology