NeurIPS 2020 · 2020
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal
Facebook AI Research
Introduces Retrieval-Augmented Generation (RAG): an architecture that combines a pretrained sequence-to-sequence model (BART) with a non-parametric memory (a Dense Passage Retrieval index over Wikipedia). Demonstrates strong performance on knowledge-intensive NLP tasks while providing transparency about which documents informed each generation. Establishes the design pattern of retrieving documents before generating.
Our summary in our own words — see the canonical source links below for the original abstract.
RAG is the architecture behind essentially every enterprise LLM deployment in 2026. The retrieval layer — typically using embedding models plus vector databases (Pinecone, Weaviate, Qdrant, Milvus) — is one of the densest engineering-acceleration surfaces we track at /sector/database and /trend/ai-native-databases-2026.
Retrieval-Augmented Generation — an architecture where a model retrieves relevant documents from an external knowledge store before generating an answer. See /define/rag for the full term definition.
Patrick Lewis and colleagues at Facebook AI Research (FAIR), published at NeurIPS 2020 (arXiv:2005.11401).
It grounds generations in retrieved documents, which mitigates knowledge-cutoff staleness, reduces hallucination on out-of-distribution facts, and enables source citation — the three properties most enterprise deployments require.
An embedding model plus a vector database (Pinecone, Weaviate, Qdrant, Milvus) for the retrieval layer. We track the engineering acceleration of that surface at /sector/database and /trend/ai-native-databases-2026.
NeurIPS 2017 · 2017
NeurIPS 2020 · 2020
NeurIPS 2022 · 2022
ICLR 2022 · 2021
arXiv preprint · 2022
NeurIPS 2022 · 2022
Code-Side Sourcing methodology, replicable on the open dataset.
Read /methodology