NeurIPS 2020 · 2020

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal

Facebook AI Research

Abstract summary

Introduces Retrieval-Augmented Generation (RAG): an architecture that combines a pretrained sequence-to-sequence model (BART) with a non-parametric memory (a Dense Passage Retrieval index over Wikipedia). Demonstrates strong performance on knowledge-intensive NLP tasks while providing transparency about which documents informed each generation. Establishes the design pattern of retrieving documents before generating.

Our summary in our own words — see the canonical source links below for the original abstract.

Why we cite this paper

RAG is the architecture behind essentially every enterprise LLM deployment in 2026. The retrieval layer — typically using embedding models plus vector databases (Pinecone, Weaviate, Qdrant, Milvus) — is one of the densest engineering-acceleration surfaces we track at /sector/database and /trend/ai-native-databases-2026.

Key findings

1Combining parametric (model weights) and non-parametric (retrieved documents) memory outperforms either alone on knowledge-intensive tasks.
2Retrieval enables citation transparency — the model's outputs can be traced to specific source documents.
3RAG addresses three core LLM limitations: knowledge cutoff dates, hallucination on out-of-distribution facts, and inability to cite sources.
4The design pattern became the foundation of the modern vector-DB-backed LLM application stack.

Canonical sources

https://arxiv.org/abs/2005.11401 https://www.semanticscholar.org/paper/58ed1fbaabe027345f7bb3a6312d41c5aac63e22

Related glossary terms

Retrieval-Augmented Generation (RAG)Embedding Model

Frequently Asked Questions

What is RAG?▾

Retrieval-Augmented Generation — an architecture where a model retrieves relevant documents from an external knowledge store before generating an answer. See /define/rag for the full term definition.

Who wrote the RAG paper?▾

Patrick Lewis and colleagues at Facebook AI Research (FAIR), published at NeurIPS 2020 (arXiv:2005.11401).

Why is RAG used in enterprise LLM applications?▾

It grounds generations in retrieved documents, which mitigates knowledge-cutoff staleness, reduces hallucination on out-of-distribution facts, and enables source citation — the three properties most enterprise deployments require.

What infrastructure does RAG depend on?▾

An embedding model plus a vector database (Pinecone, Weaviate, Qdrant, Milvus) for the retrieval layer. We track the engineering acceleration of that surface at /sector/database and /trend/ai-native-databases-2026.

Five breakout startups, every Sunday — before the round gets crowded

The free Acceleration Watch: five venture-backed teams accelerating on the engineering signal, translated into plain English — 21 to 47 days before the deck circulates. No code-reading, no card.

Get the free Sunday issue →

Signed The Data Nerd · pseudonymous narrator · methodology over personality

Other research papers

NeurIPS 2017 · 2017

Read our own methodology paper

Code-Side Sourcing methodology, replicable on the open dataset.

Read /methodology

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Language Models are Few-Shot Learners

Training language models to follow instructions with human feedback

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Read our own methodology paper

🚀 Explore Our Network

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Abstract summary

Why we cite this paper

Key findings

Canonical sources

Related glossary terms

Frequently Asked Questions

Five breakout startups, every Sunday — before the round gets crowded

Other research papers

Attention Is All You Need

Language Models are Few-Shot Learners

Training language models to follow instructions with human feedback

LoRA: Low-Rank Adaptation of Large Language Models

Constitutional AI: Harmlessness from AI Feedback

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Read our own methodology paper