NeurIPS 2022 · 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia

Google Research

Abstract summary

Demonstrates that prompting LLMs to articulate intermediate reasoning steps before producing a final answer ('chain-of-thought prompting') dramatically improves accuracy on math, logic, and multi-step problem-solving benchmarks. The improvement scales with model size and emerges only at sufficient scale. Establishes step-by-step reasoning as a critical prompting technique and a foundation for later 'reasoning model' designs.

Our summary in our own words — see the canonical source links below for the original abstract.

Why we cite this paper

Chain-of-Thought is the technique that 'reasoning models' (OpenAI o1/o3, Anthropic Claude with extended thinking, DeepSeek R1) train into the model rather than relying on prompting alone. The category emerged directly from this paper's framing. Our /trend/agentic-ai-frameworks-2026 tracks the engineering acceleration in this category.

Key findings

1Chain-of-thought prompting dramatically improves LLM performance on multi-step reasoning tasks.
2The capability emerges only at sufficient model scale (~100B parameters).
3Step-by-step reasoning can be elicited via few-shot prompting without model retraining.
4Foundation for modern reasoning models that train extended chain-of-thought as a native capability.

Canonical sources

https://arxiv.org/abs/2201.11903 https://www.semanticscholar.org/paper/1b6e810ce0afd0dd093f789d2b2742d047e316d5

Related glossary terms

Chain of Thought (CoT)Foundation Model Reasoning Model

Frequently Asked Questions

What is chain-of-thought prompting?▾

A prompting technique where the model is instructed to articulate intermediate reasoning steps before producing a final answer. See /define/chain-of-thought for the full term definition.

Who wrote the chain-of-thought paper?▾

Jason Wei and colleagues at Google Research, published at NeurIPS 2022 (arXiv:2201.11903).

Does chain-of-thought work on small models?▾

No. The paper shows the benefit emerges only at sufficient model scale (around 100B parameters). Below that threshold, step-by-step prompting does not reliably improve reasoning accuracy.

How does this relate to reasoning models?▾

Modern reasoning models (OpenAI o1/o3, Claude with extended thinking, DeepSeek R1) train extended chain-of-thought into the model as a native capability, rather than relying on few-shot prompting alone.

Five breakout startups, every Sunday — before the round gets crowded

The free Acceleration Watch: five venture-backed teams accelerating on the engineering signal, translated into plain English — 21 to 47 days before the deck circulates. No code-reading, no card.

Get the free Sunday issue →

Signed The Data Nerd · pseudonymous narrator · methodology over personality

Other research papers

NeurIPS 2017 · 2017

Read our own methodology paper

Code-Side Sourcing methodology, replicable on the open dataset.

Read /methodology

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Language Models are Few-Shot Learners

Training language models to follow instructions with human feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Read our own methodology paper

🚀 Explore Our Network

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Language Models are Few-Shot Learners

Training language models to follow instructions with human feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Read our own methodology paper