134 controlled-vocabulary terms used across the VC Deal Flow Signal site, grouped into six families. Each term has its own dedicated page with cross-references to the SSRN methodology paper and the formal signal primitives.
For the flat alphabetic listing with all definitions on one page, see /glossary.
The named category VC Deal Flow Signal defines.
Metrics, signal types, and decision rules from the methodology.
The named mechanism behind VC Deal Flow Signal.
The total number of commits to a startup's most active public GitHub repository over a rolling 14-day window.
The percentage change in commit velocity compared to the preceding 14-day window.
A sustained increase in a startup's engineering output relative to its own historical baseline.
Any data-driven indicator that helps an investor identify a promising startup before traditional deal sourcing channels surface it.
The change in the number of unique contributors to a startup's GitHub repository over time.
A signal type indicating that a startup's contributor growth rate exceeds 50% in a short window.
A signal type indicating that a startup has created three or more new public repositories in 30 days.
A signal type indicating that a startup's commit velocity has increased 150% or more versus its baseline.
A signal type indicating general engineering acceleration that does not fit the hiring burst, infrastructure buildout, or deploy spike categories.
The deterministic exclusion rule that removes commits authored by automated accounts (Dependabot, Renovate, GitHub Actions, and any account name matching the substring 'bot') before any aggregation runs.
The rule that an acceleration breakout must persist into a second 14-day window before VC Deal Flow Signal treats it as actionable.
The Gini coefficient of commit distribution across contributors over the same 14-day window used for velocity.
Programmatic SEO, AEO, GEO, AIO, and the schemas behind them.
A content strategy that generates hundreds or thousands of search-optimized pages from structured data using templates.
The practice of structuring website content so that AI assistants and large language models (LLMs) can accurately cite it when answering user questions.
An open protocol that allows websites to notify search engines (Bing, Yandex, Seznam, Naver, and others) about new or updated content in real time.
Structuring content so that answer engines — Google's People-Also-Ask, Reddit pull-quotes, Quora top answers, ChatGPT search results, Perplexity citations — can extract a complete, self-contained answer in 40–80 words.
The subset of GEO/AEO targeted specifically at Google's AI Overviews (formerly SGE).
A Schema.
JavaScript Object Notation for Linked Data — the W3C-standard syntax for embedding structured data in web pages.
A Schema.
A Schema.
A Schema.
A Schema.
A Schema.
An HTML link-element annotation (or sitemap entry) that signals the language and region of a page to search engines.
A link-element annotation (rel=canonical) that designates one URL as the authoritative version of a page when multiple URLs serve the same content.
A plaintext file at the root of a website that tells web crawlers which paths they may or may not fetch.
An XML file that lists the canonical URLs a website wants indexed.
The current major version of the OpenAPI Specification — a vendor-neutral schema for describing HTTP APIs.
A vendor-extension property added to OpenAPI operations that names the corresponding MCP tool, so a single OpenAPI fetch maps every REST endpoint to its agent-callable equivalent.
A central DataCatalog manifest at /.
Emerging conventions for advertising a site's policy toward AI training and retrieval.
A newline-delimited JSON file where every line is one self-contained question-answer pair.
A proposed standard for guiding LLMs and AI assistants to a site's most useful content surfaces in a single deterministic file.
The full-content companion to llms.
The collaborative vocabulary maintained by Google, Microsoft, Yahoo, and Yandex for structuring on-page metadata.
A Schema.
A Schema.
Net burn divided by net new ARR in the same period.
Net new ARR in a quarter, divided by the prior quarter's sales and marketing spend, then annualized.
The number of months it takes for the gross profit from a new customer to repay the fully-loaded cost of acquiring them.
Lifetime Value — the total gross profit a startup expects to earn from an average customer over the full relationship.
For SaaS, the ratio of expansion plus new ARR to contraction plus churned ARR in a period.
The compounding sequence of dilutive events between founding and exit — SAFE conversions, option-pool refreshes, priced rounds, secondary offers — modeled as a stack so a founder can see their fully diluted ownership at each step.
Annual Recurring Revenue — the annualized run rate of all active recurring contracts, excluding one-time fees, services, and usage spikes.
Monthly Recurring Revenue — ARR divided by 12, or the sum of all currently active monthly subscription contracts.
Net Revenue Retention — the percentage of ARR that a cohort of existing customers continues to deliver after a fixed window (typically 12 months), including expansion, contraction, and churn.
Gross Revenue Retention — the percentage of ARR a cohort retains after churn and contraction, but excluding expansion.
Customers or revenue lost in a period, divided by the count or revenue at the start of the period.
Customer Acquisition Cost — fully loaded sales and marketing spend in a period divided by new customers acquired in that period.
Revenue minus cost of goods sold, divided by revenue, expressed as a percentage.
Revenue minus all variable costs of serving a customer — COGS plus CAC plus variable support — divided by revenue.
Ideal Customer Profile — the firmographic and behavioral description of the segment most likely to buy, retain, and expand.
The training technique that aligns large language models to human-preferred outputs after pretraining.
Prompting technique where a model is instructed (or trained) to articulate intermediate reasoning steps before producing a final answer.
Architecture where a model retrieves relevant documents from an external knowledge store (vector database, search index, or hybrid) before generating an answer.
Capability of an LLM to invoke external functions, APIs, or other tools via a structured output format (typically JSON).
The maximum number of tokens an LLM can attend to in a single inference pass — its working memory.
The wall-clock time between sending a prompt to an LLM and receiving the response.
Continued training of a foundation model on a smaller, task-specific dataset to specialize its behavior.
Parameter-efficient fine-tuning method that adds small low-rank matrices to a frozen base model.
Training a smaller 'student' model to mimic a larger 'teacher' model's outputs.
Reducing the numerical precision of model weights (typically from 16-bit floats to 8-bit, 4-bit, or even 2-bit integers) to shrink memory footprint and speed up inference.
A neural network that converts text (or images, audio, etc.
A large, broadly-trained neural network that serves as a base for downstream fine-tuning, prompting, or RAG.
An LLM whose trained weights are publicly available for download, fine-tuning, and self-hosting.
An LLM trained to use extended chain-of-thought as a native capability rather than as a prompting technique.
A model that accepts and/or generates multiple input types — typically text and images, sometimes audio and video.
An M&A transaction motivated by strategic synergy rather than financial return alone.
An acquisition primarily motivated by acquiring the team rather than the company's products or revenue.
Acquisition consideration paid over time, contingent on the acquired business hitting specified milestones — revenue targets, product launches, or team retention.
A financing round where the post-money valuation is lower than the previous round's post-money.
An interim financing round between two priced rounds, typically structured as convertible notes or SAFEs that convert at the next priced round's terms (often with a discount or valuation cap).
A structured liquidity event where existing shareholders are offered the chance to sell some or all of their shares at a fixed price, usually to new investors or to the company itself.
Sale of existing shares from an early investor or employee to a new buyer, distinct from primary issuance of new shares by the company.
Investor right to receive their original investment (or a multiple of it) back before common shareholders see any proceeds in an exit.
An investor's right to participate in future financing rounds at their existing ownership percentage, preserving their stake from dilution.
Investor protection against down rounds — adjusts the conversion price of preferred shares so existing investors aren't diluted as much when new shares are issued at a lower price.
A single Git repository containing multiple distinct projects, applications, or libraries.
Code executed at globally-distributed compute nodes physically close to end users — typically running in V8 isolates, Wasm sandboxes, or lightweight VMs.
Software-delivery practice where every code change merged to main is automatically deployed to production, often within minutes.
Runtime toggle controlling whether a feature is exposed to a given user, request, or environment — independently from deployment.
Observability technique that follows a single request as it crosses multiple services, recording the timing and metadata of each hop.
Prompting framework for LLM agents introduced by Yao et al.
Persistent state an LLM agent maintains across turns, sessions, or interactions.
Architecture where multiple LLM agents with distinct roles collaborate on a task, coordinated by a meta-agent or explicit workflow.
The discipline of measuring LLM agent capability, reliability, and safety across well-defined benchmarks.
Runtime constraints that limit agent behavior — output filtering, tool-call validation, spend limits, time budgets, and harmful-action detection.
An LLM agent designed to decompose a high-level goal into a sequence of sub-tasks before executing.
An LLM agent designed to execute a task over hours, days, or longer — across multiple sessions, with persistent state, and resilient to interruption.
Benchmark suite that measures LLM agent capability at calling, chaining, and reasoning over real-world APIs.
The legal document governing the relationship between a venture capital fund (the General Partner) and its investors (Limited Partners).
The capital that the General Partner (the VC firm) commits to its own fund, expressed as a percentage of fund size.
The General Partner's share of fund profits above the LPs' return of capital (and typically a preferred return hurdle of 6-8%).
The annual fee a venture fund charges its LPs to cover operating costs, typically 2% of committed capital during the investment period and 2% of invested capital during the harvest period.
The year a fund made its first investment, used by LPs to benchmark fund performance against peers raised in the same market environment.
A formal request from a venture fund's General Partner to its Limited Partners to wire committed but not-yet-funded capital to the fund.
One of the four DORA metrics measuring software-delivery performance.
One of the four DORA metrics — the time between a code commit and that code running successfully in production.
One of the four DORA metrics — the percentage of changes to production that result in degraded service, requiring hotfix, rollback, or remediation.
Adversarial testing of LLMs by humans (or other AI systems) attempting to elicit harmful, unsafe, or undesired behaviors.
A prompt or sequence of prompts designed to bypass an LLM's safety training and elicit prohibited behaviors.
A structured benchmark measuring LLM capability on a specific task.
MCP, A2A, micropayments, identity, federation.
An open standard from Anthropic for exposing tools and data to large-language-model hosts (Claude Desktop, Cursor, agentic frameworks).
Google's Agent-to-Agent protocol — a JSON-RPC envelope plus an /.
The canonical wire transport for MCP servers running over HTTP — a JSON-RPC 2.
A stateless remote-procedure-call protocol encoded in JSON — request envelopes carry method, params, and id; response envelopes carry result or error.
An open standard for HTTP per-request micropayments using the existing 402 Payment Required status code.
An RFC 7033 protocol for discovering information about a user or resource at a domain by querying /.
A W3C-standard identifier scheme that lets entities prove control of an identifier without relying on a centralised registry.
The Fediverse-standard discovery protocol for federated services, served at /.
SSRN, Zenodo, OpenAlex, DOIs, licenses.
A 0–100 score computed from a GitHub user's public starring history, measuring how many validated unicorn outcomes the user starred before the funding, acquisition, or $1B-valuation event.
A working-paper deposit on SSRN (Social Science Research Network), the standard preprint server for finance and business research.
A persistent Digital Object Identifier minted by Zenodo (a CERN-operated open-access repository) for a dataset or software release.
An open scholarly knowledge graph from OurResearch that mirrors Microsoft Academic Graph's structure but with no institutional gating.
A non-profit registration agency for DOIs assigned to research data, software, and grey literature.
An ORCID iD is a persistent digital identifier for individual researchers, used to disambiguate authorship across publishers and preprint servers.
A Digital Object Identifier — a persistent identifier for an electronic document or dataset, resolved through doi.
Creative Commons Attribution 4.
A Schema.
Stage names, instruments, and core financial metrics.
The earliest venture-funding stage, typically a $250k–$2M round that funds the first six to twelve months of a startup's work — often before there is a product, sometimes before there is a team.
The first institutional venture round, typically $1M–$5M, that funds the build of an MVP and the search for product-market fit.
The first priced equity round following the seed, typically $5M–$20M raised against a $20M–$80M post-money valuation.
The second priced equity round, typically $15M–$50M raised against a $80M–$300M post-money valuation.
The investor who sets the price, terms, and structure of a venture round and typically writes the largest check.
A short, non-binding document outlining the principal terms of a venture investment — valuation, security type, board composition, anti-dilution, liquidation preference, and protective provisions.
A pre-priced venture instrument originated by Y Combinator in 2013 — a contract that converts to equity at the next priced round at a discount or under a valuation cap.
A short-term debt instrument that converts to equity at a later priced round.
The maximum company valuation at which a SAFE or convertible note converts into equity at the next priced round.
The net rate at which a startup spends cash, expressed in dollars per month.
The number of months a startup can operate at its current burn rate before running out of cash.
The capitalization table — a record of every share, option, warrant, and convertible instrument in a startup, broken down by holder.
The free Acceleration Watch: five venture-backed teams accelerating on the engineering signal, translated into plain English — 21 to 47 days before the deck circulates. No code-reading, no card.