ICLR 2022 · 2021
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang
Microsoft
Introduces Low-Rank Adaptation (LoRA): a parameter-efficient fine-tuning technique that adds small low-rank matrices to a frozen base model. Demonstrates that LoRA matches full fine-tuning performance on multiple benchmarks while updating only 0.1%–1% of parameters. Reduces GPU memory requirements and storage footprint by orders of magnitude.
Our summary in our own words — see the canonical source links below for the original abstract.
LoRA is the standard parameter-efficient fine-tuning method in 2026, deployed across Hugging Face's PEFT ecosystem and integrated into every major open-weight LLM serving stack. Engineering signals around LoRA adapter ecosystems are one of the cleanest measures of practical AI-application velocity in our /trend/ai-coding-tools-2026 leaderboard.
Low-Rank Adaptation — a parameter-efficient fine-tuning method that adds small low-rank matrices to a frozen base model. See /define/lora for the full term definition.
Edward J. Hu and colleagues at Microsoft, published at ICLR 2022 (arXiv:2106.09685).
Only 0.1%–1% of the base model's parameters, while matching full fine-tuning quality on the paper's benchmarks and cutting GPU memory and storage requirements by roughly 3–10×.
Its parameter efficiency makes specialization cheap, and adapters can be mixed and matched at inference time to enable multi-tenant serving. It is the standard PEFT method for open-weight models like Llama, Mistral, Qwen, and Gemma.
NeurIPS 2017 · 2017
NeurIPS 2020 · 2020
NeurIPS 2022 · 2022
NeurIPS 2020 · 2020
arXiv preprint · 2022
NeurIPS 2022 · 2022
Code-Side Sourcing methodology, replicable on the open dataset.
Read /methodology