NeurIPS 2020 · 2020

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal

OpenAI

Abstract summary

Introduces GPT-3, a 175B-parameter autoregressive language model, and demonstrates that scaling up a Transformer LM produces emergent few-shot in-context learning capability. Shows that a single model can perform many NLP tasks competitively without fine-tuning, simply by being shown a few examples in the prompt. Documents capability and scaling behaviors that defined the LLM era.

Our summary in our own words — see the canonical source links below for the original abstract.

Why we cite this paper

GPT-3 is the catalyst paper for the modern LLM era. The few-shot in-context learning paradigm became the dominant interaction pattern for AI products, and the 175B-parameter scale established the design space that frontier labs (Anthropic, OpenAI, Mistral, Cohere, Hugging Face) operate within. The applied-AI category we track at /signal/[anthropic/openai/etc.] would not exist without the GPT-3 demonstration.

Key findings

1Few-shot learning emerges as a capability of sufficiently-scaled language models, without explicit task-specific fine-tuning.
2Performance scales smoothly with model size, training data, and compute — establishing the scaling-law trend later formalized by Kaplan et al.
3A single LLM can perform translation, question-answering, summarization, arithmetic, and code generation competitively given few-shot prompting.
4The 175B-parameter scale established a new design space that frontier labs compete within.

Canonical sources

https://arxiv.org/abs/2005.14165 https://www.semanticscholar.org/paper/90abbc2cf38462b954ae1b772fac9532e2ccd8b0

Related glossary terms

Context Window Fine-tuning Foundation Model

Frequently Asked Questions

What is GPT-3?▾

GPT-3 is a 175B-parameter autoregressive language model developed by OpenAI and released in 2020. The paper documents its design, training, and demonstrates the few-shot in-context learning capability that became central to modern LLM applications.

Where is the canonical paper?▾

Available on arXiv (arXiv:2005.14165). One of the most-cited ML papers since publication.

Who published the GPT-3 paper?▾

OpenAI. Lead authors include Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, and Prafulla Dhariwal. It appeared at NeurIPS 2020.

What is few-shot in-context learning?▾

The ability of a sufficiently large language model to perform a new task from a handful of examples shown in the prompt, with no weight updates or fine-tuning. GPT-3 was the first large-scale demonstration that this capability emerges from scale.

Five breakout startups, every Sunday — before the round gets crowded

The free Acceleration Watch: five venture-backed teams accelerating on the engineering signal, translated into plain English — 21 to 47 days before the deck circulates. No code-reading, no card.

Get the free Sunday issue →

Signed The Data Nerd · pseudonymous narrator · methodology over personality

Other research papers

NeurIPS 2017 · 2017

Read our own methodology paper

Code-Side Sourcing methodology, replicable on the open dataset.

Read /methodology

Language Models are Few-Shot Learners

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Training language models to follow instructions with human feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Read our own methodology paper

🚀 Explore Our Network

Language Models are Few-Shot Learners

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Training language models to follow instructions with human feedback

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Read our own methodology paper