Discoverability surfaces
The training technique that aligns large language models to human-preferred outputs after pretraining. Three steps: (1) collect human preferences over model outputs, (2) train a reward model to score outputs, (3) optimize the base model against the reward model via PPO or similar RL algorithm. RLHF is what turns a raw foundation model into the instruct-tuned, helpful-by-default behavior that ChatGPT, Claude, and Gemini exhibit. Modern alternatives (DPO, KTO, RLAIF) achieve similar results without the explicit reward-model step.
Programmatic SEO, AEO, GEO, AIO, and the schemas behind them.
A content strategy that generates hundreds or thousands of search-optimized pages from structured data using templates.
The practice of structuring website content so that AI assistants and large language models (LLMs) can accurately cite it when answering user questions.
An open protocol that allows websites to notify search engines (Bing, Yandex, Seznam, Naver, and others) about new or updated content in real time.
Structuring content so that answer engines — Google's People-Also-Ask, Reddit pull-quotes, Quora top answers, ChatGPT search results, Perplexity citations — can extract a complete, self-contained answer in 40–80 words.
The subset of GEO/AEO targeted specifically at Google's AI Overviews (formerly SGE).
A Schema.
JavaScript Object Notation for Linked Data — the W3C-standard syntax for embedding structured data in web pages.
A Schema.
This definition is published under CC BY 4.0. Cite as:
The Data Nerd. "RLHF (Reinforcement Learning from Human Feedback)." VC Deal Flow Signal Glossary, https://signals.gitdealflow.com/define/rlhf.
The free Acceleration Watch turns terms like RLHF (Reinforcement Learning from Human Feedback) into five named, accelerating startups every Sunday — translated into plain English, 21 to 47 days before the deck circulates. No code-reading, no card.