NeurIPS 2020 · 2020
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal
OpenAI
Introduces GPT-3, a 175B-parameter autoregressive language model, and demonstrates that scaling up a Transformer LM produces emergent few-shot in-context learning capability. Shows that a single model can perform many NLP tasks competitively without fine-tuning, simply by being shown a few examples in the prompt. Documents capability and scaling behaviors that defined the LLM era.
Our summary in our own words — see the canonical source links below for the original abstract.
GPT-3 is the catalyst paper for the modern LLM era. The few-shot in-context learning paradigm became the dominant interaction pattern for AI products, and the 175B-parameter scale established the design space that frontier labs (Anthropic, OpenAI, Mistral, Cohere, Hugging Face) operate within. The applied-AI category we track at /signal/[anthropic/openai/etc.] would not exist without the GPT-3 demonstration.
GPT-3 is a 175B-parameter autoregressive language model developed by OpenAI and released in 2020. The paper documents its design, training, and demonstrates the few-shot in-context learning capability that became central to modern LLM applications.
Available on arXiv (arXiv:2005.14165). One of the most-cited ML papers since publication.
OpenAI. Lead authors include Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, and Prafulla Dhariwal. It appeared at NeurIPS 2020.
The ability of a sufficiently large language model to perform a new task from a handful of examples shown in the prompt, with no weight updates or fine-tuning. GPT-3 was the first large-scale demonstration that this capability emerges from scale.
NeurIPS 2017 · 2017
NeurIPS 2022 · 2022
NeurIPS 2020 · 2020
ICLR 2022 · 2021
arXiv preprint · 2022
NeurIPS 2022 · 2022
Code-Side Sourcing methodology, replicable on the open dataset.
Read /methodology