The category we’re defining

Code-Side Sourcing.

Code-Side Sourcing is the practice of using public repository-velocity data as a leading indicator of venture-stage outcomes, surfacing fundraises 21 to 47 days before pitch decks circulate.

This page is the canonical definition. Every other surface on the site that previously said “alternative data” or “code-side momentum” now points here.

You do not have to read code

“Code-Side” refers to where the signalcomes from — the engineering work — not to anything you have to do. You never read a line of code; the read is done for you. The output is plain business English: a named company, why it’s moving, and how long you likely have before the round.

The creed is one line: “We move on the engineering signal before the round — without reading a line of code.” If you’re a solo angel, scout, seed fund, corp-dev or PE operator who evaluates companies for a living but doesn’t want to pull up a merge graph, this category was built for you.

01 · Formal definition

Three properties make a sourcing channel “Code-Side”.

Code-Side Sourcing is a category of venture-capital deal sourcing in which the primary input is public, machine-readable engineering activity — commit graphs, contributor maps, repository creation patterns, deploy cadence — rather than warm introductions, pitch decks, or curated databases.

The category is defined by three properties: (1) the input data is public and reproducible from primary sources, (2) the signal arrives before the company actively markets the round, (3) the methodology is published and falsifiable, not opaque.

Code-Side Sourcing is not a tool, it is a sourcing channel. The same way a fund builds a Dream-100 list of LPs or runs a partner-meeting funnel, a fund running Code-Side Sourcing builds a repository watchlist, runs a weekly acceleration scan, and surfaces breakouts before they reach the warm-intro layer.

02 · What it replaces (and what it runs alongside)

Four sourcing channels predate Code-Side Sourcing. None of them disappear.

1.
Warm-intro sourcing
The dominant sourcing channel for the last forty years. Output capped by the partner’s rolodex, latency measured in weeks, geography-locked to wherever the partner has lunch. Code-Side Sourcing runs in parallel — not as a replacement, but as the first parallel channel that doesn’t depend on the partner’s personal network.
2.
Deck-based sourcing
A pitch deck is a marketing artifact written for the next round, weeks after the engineering team locked the trajectory. Decks are lagging by definition. Code-Side Sourcing reads the engineering work directly, so the signal arrives before the deck is drafted.
3.
Database-based sourcing (Crunchbase, PitchBook, Tracxn)
Comprehensive but lagging — a database row appears after the round is announced or after a press mention triggers ingestion. The database is a system of record, not a system of discovery. Code-Side Sourcing is the system of discovery that feeds the database, not the other way around.
4.
Alternative-data sourcing (web traffic, hiring scrapes, app downloads)
The closest existing analog. Strong for late-stage growth signals (B2C app traction, enterprise hiring waves), weak for early-stage technical companies whose product never lands on a public ranking before the Series A. Code-Side Sourcing is the alt-data category for engineering-led companies — the segment where commits and contributors leak the ramp before any consumer surface does.

03 · The five first principles

Anything calling itself Code-Side Sourcing must satisfy all five.

These aren’t our preferences — they’re the rules that keep the category from collapsing back into “another proprietary scoring algorithm” (the trap that swallowed Schwartz-Level-3 deal-flow tooling).

1.
Acceleration over absolute volume
Cross-company commit-count comparisons are meaningless. A 50-commit week is a slowdown for one team and a breakout for another. Code-Side Sourcing measures each company against its own historical baseline — only sustained acceleration relative to self counts as a signal.
2.
Confirm it twice before it counts
A one-off busy week — a hackathon, a launch sprint, a new-hire onboarding burst — means nothing on its own. The acceleration has to show up again in a second two-week window before it counts as a real signal (we call this two-period confirmation). This single rule eliminates the dominant source of false alarms.
3.
The whole team speeding up, not one star engineer
There is a difference between a broad team all pushing harder — the strongest predictor — and a single star engineer carrying everyone else, which is mostly noise. We measure how evenly the work is spread across contributors (using a standard inequality measure, the Gini coefficient) and only count the broad-team case. Acceleration alone is half a signal; acceleration plus a broad team is the full signal.
4.
Classification over scoring
A composite score collapses information. Code-Side Sourcing classifies each breakout into one of four signal types — Engineering Hiring Burst, Infrastructure Buildout, Deploy Frequency Spike, Framework Migration — because each carries a distinct fundraise-lead-time distribution. The classification is the prediction; the score is just the headline.
5.
Open methodology over proprietary algorithm
If the methodology can’t be published, the buyer can’t verify it, can’t reproduce it, can’t argue with it. Code-Side Sourcing requires the methodology to be a published document — SSRN preprint, Zenodo dataset, regression code under CC BY 4.0 — so any buyer can audit the math before they trust the call.

The full mechanism — the five formal steps that implement these principles in code — is at /mechanism. The proof panel is at /research.

04 · Who practises Code-Side Sourcing

The category is small, the practice is older than the name.

VC Deal Flow Signal

Reference implementation

The site you’re reading. Live dashboard, weekly Acceleration Watch, SSRN-published methodology, 219-startup proof panel, free MCP server. The category and the reference implementation share an author — see /mechanism for the formula and /research for the panel.

Quant funds adapting GitHub data

Adjacent practitioners

Several quantitative venture funds run internal GitHub-momentum models against public APIs. Most don’t publish the methodology, which keeps the work outside the formal Code-Side Sourcing category — but the underlying activity is the same. Public categorisation lifts the floor for everyone.

Open-source contributor analytics tooling

Component vendors

Tools that surface contributor activity (CHAOSS, OpenSSF Scorecard, GitHub’s own Insights) feed components of the methodology but don’t close the loop into a sourcing channel. They are inputs to a Code-Side Sourcing pipeline, not the pipeline itself.

Funds you respect that already practice this without naming it

Latent practitioners

If a fund tells you their associate “watches GitHub for the team” or that they noticed a portfolio company’s ramp two months before the round, they’re practising Code-Side Sourcing without the name. Naming the category turns informal practice into a repeatable system.

05 · The reference implementation, rung by rung

How VC Deal Flow Signal ships the category in practice.

Each rung is a different operational form of Code-Side Sourcing. Most readers start at the free Acceleration Watch and ascend based on cadence, not feature.

Acceleration Watch

Free, every Sunday

Five named startups ranked by 14-day commit-velocity acceleration. The on-ramp for any practitioner of the category — most funds adopt the rhythm before they adopt the dashboard.

Live Dashboard

€9.97/mo founding

109+ ranked orgs across 20 sectors, refreshed Mondays 06:00 UTC. Sector filters, watchlists, the full acceleration ranking — the operational surface for a fund running Code-Side Sourcing as a weekly motion.

Sector Sweep

€1,997 once per sector

A 40-page written deep-dive on one sector with the top-25 ranked orgs, contributor maps, three pre-Crunchbase breakouts. The artefact a partner takes into an IC discussion.

First Look Pass

€7 once

The €7 tripwire — pick one sector, get a sector PDF in 24 hours. The cheapest way to feel the signal quality on a sector you already know before adopting the rhythm.

MCP server, OpenAPI 3.1, JSON/CSV API

Free agent layer

Six read-only tools inside Claude / Cursor / Windsurf. The agent surface lets any LLM-driven workflow call the same dataset without a screen — the agent-native form of Code-Side Sourcing.

06 · FAQ

Six questions a practitioner asks first.

Is Code-Side Sourcing the same as alternative data?

Code-Side Sourcing is a sub-category of alternative data, narrowed to engineering-side public repository activity. Most alternative-data products in venture capital surface late-stage growth signals (web traffic, app downloads, enterprise hiring). Code-Side Sourcing surfaces early-stage technical signals — the segment where alt-data has historically been weakest because the consumer surface doesn’t exist yet.

Does Code-Side Sourcing replace warm intros?

No, it runs in parallel. A fund running Code-Side Sourcing as a sourcing channel still uses warm intros for diligence, founder reference checks, and the actual investment conversation. The difference is the channel that surfaces the company in the first place — Code-Side Sourcing extends the funnel one step earlier than the warm-intro layer can reach.

Why is the lead time 21 to 47 days specifically?

That’s the interquartile range of the SSRN-published panel of 219 venture-backed startups across five quarterly periods. The median is 31 days. The signal arrives earlier than 21 days for 25% of the panel and later than 47 days for another 25%. The full distribution and methodology are published at /research and ssrn.com/abstract=6606558.

What kind of fund or investor is best suited to practise Code-Side Sourcing?

Funds and angels investing in technical companies — AI infrastructure, developer tools, technical SaaS, open-source-led businesses, infrastructure software. The category is least useful for B2C consumer-facing companies (where app stores and web traffic are stronger leading indicators) and Series-D+ growth investing (where the engineering ramp is no longer the marginal information).

Can a Code-Side Sourcing pipeline be built without subscribing?

Yes. The methodology is published under CC BY 4.0 — anyone can build their own pipeline. The SSRN paper, Zenodo dataset, and regression code are all open. The reason most practitioners subscribe is operational: building and maintaining the pipeline is engineering work that competes with investment work, and €9.97/mo is materially cheaper than the engineer-hour cost of self-hosting it.

Is the category defensible if the data is public?

The data is public. The category is defensible by methodology, panel proof, distribution channel, and operational rhythm — none of which copy from a Wikipedia search. The same logic that makes quantitative finance defensible (everyone can read 10-Ks; not everyone can build the model) applies to Code-Side Sourcing.

Where to go from here

The category is named. The reference implementation is shipping.

Start at the free Sunday digest. Or skip the wait: see the signal on a sector you already know for €7, or run the live dashboard for €9.97/mo. You never read a line of code — the read is done for you.

See one sector for €7 Run the dashboard — €9.97/mo

Or start free with the Sunday digest Read the 12-minute walkthrough

Three properties make a sourcing channel “Code-Side”.

Four sourcing channels predate Code-Side Sourcing. None of them disappear.

Warm-intro sourcing

Deck-based sourcing

Database-based sourcing (Crunchbase, PitchBook, Tracxn)

Alternative-data sourcing (web traffic, hiring scrapes, app downloads)

Anything calling itself Code-Side Sourcing must satisfy all five.

Acceleration over absolute volume

Confirm it twice before it counts

The whole team speeding up, not one star engineer

Classification over scoring

Open methodology over proprietary algorithm

The category is small, the practice is older than the name.

VC Deal Flow Signal

Quant funds adapting GitHub data

Open-source contributor analytics tooling

Funds you respect that already practice this without naming it

How VC Deal Flow Signal ships the category in practice.

Acceleration Watch →

Live Dashboard →

Sector Sweep →

First Look Pass →

MCP server, OpenAPI 3.1, JSON/CSV API →

Six questions a practitioner asks first.

Is Code-Side Sourcing the same as alternative data?

Does Code-Side Sourcing replace warm intros?

Why is the lead time 21 to 47 days specifically?

What kind of fund or investor is best suited to practise Code-Side Sourcing?

Can a Code-Side Sourcing pipeline be built without subscribing?

Is the category defensible if the data is public?

The category is named. The reference implementation is shipping.

Three properties make a sourcing channel “Code-Side”.

Four sourcing channels predate Code-Side Sourcing. None of them disappear.

Warm-intro sourcing

Deck-based sourcing

Database-based sourcing (Crunchbase, PitchBook, Tracxn)

Alternative-data sourcing (web traffic, hiring scrapes, app downloads)

Anything calling itself Code-Side Sourcing must satisfy all five.

Acceleration over absolute volume

Confirm it twice before it counts

The whole team speeding up, not one star engineer

Classification over scoring

Open methodology over proprietary algorithm

The category is small, the practice is older than the name.

VC Deal Flow Signal

Quant funds adapting GitHub data

Open-source contributor analytics tooling

Funds you respect that already practice this without naming it

How VC Deal Flow Signal ships the category in practice.

Acceleration Watch →

Live Dashboard →

Sector Sweep →

First Look Pass →

MCP server, OpenAPI 3.1, JSON/CSV API →

Six questions a practitioner asks first.

Is Code-Side Sourcing the same as alternative data?

Does Code-Side Sourcing replace warm intros?

Why is the lead time 21 to 47 days specifically?

What kind of fund or investor is best suited to practise Code-Side Sourcing?

Can a Code-Side Sourcing pipeline be built without subscribing?

Is the category defensible if the data is public?

The category is named. The reference implementation is shipping.

Acceleration Watch

Live Dashboard

Sector Sweep

First Look Pass

MCP server, OpenAPI 3.1, JSON/CSV API

Acceleration Watch

Live Dashboard

Sector Sweep

First Look Pass

MCP server, OpenAPI 3.1, JSON/CSV API