The category we’re defining
Code-Side Sourcing is the practice of using public repository-velocity data as a leading indicator of venture-stage outcomes, surfacing fundraises 21 to 47 days before pitch decks circulate.
This page is the canonical definition. Every other surface on the site that previously said “alternative data” or “code-side momentum” now points here.
01 · Formal definition
Code-Side Sourcing is a category of venture-capital deal sourcing in which the primary input is public, machine-readable engineering activity — commit graphs, contributor maps, repository creation patterns, deploy cadence — rather than warm introductions, pitch decks, or curated databases.
The category is defined by three properties: (1) the input data is public and reproducible from primary sources, (2) the signal arrives before the company actively markets the round, (3) the methodology is published and falsifiable, not opaque.
Code-Side Sourcing is not a tool, it is a sourcing channel. The same way a fund builds a Dream-100 list of LPs or runs a partner-meeting funnel, a fund running Code-Side Sourcing builds a repository watchlist, runs a weekly acceleration scan, and surfaces breakouts before they reach the warm-intro layer.
02 · What it replaces (and what it runs alongside)
The dominant sourcing channel for the last forty years. Output capped by the partner’s rolodex, latency measured in weeks, geography-locked to wherever the partner has lunch. Code-Side Sourcing runs in parallel — not as a replacement, but as the first parallel channel that doesn’t depend on the partner’s personal network.
A pitch deck is a marketing artifact written for the next round, weeks after the engineering team locked the trajectory. Decks are lagging by definition. Code-Side Sourcing reads the engineering work directly, so the signal arrives before the deck is drafted.
Comprehensive but lagging — a database row appears after the round is announced or after a press mention triggers ingestion. The database is a system of record, not a system of discovery. Code-Side Sourcing is the system of discovery that feeds the database, not the other way around.
The closest existing analog. Strong for late-stage growth signals (B2C app traction, enterprise hiring waves), weak for early-stage technical companies whose product never lands on a public ranking before the Series A. Code-Side Sourcing is the alt-data category for engineering-led companies — the segment where commits and contributors leak the ramp before any consumer surface does.
03 · The five first principles
These aren’t our preferences — they’re the rules that keep the category from collapsing back into “another proprietary scoring algorithm” (the trap that swallowed Schwartz-Level-3 deal-flow tooling).
Cross-company commit-count comparisons are meaningless. A 50-commit week is a slowdown for one team and a breakout for another. Code-Side Sourcing measures each company against its own historical baseline — only sustained acceleration relative to self counts as a signal.
Hackathon weeks, launch sprints, and onboarding bursts all produce single-period commit spikes that mean nothing. Code-Side Sourcing requires the breakout to persist into a second 14-day window before it counts as actionable. This single rule eliminates the dominant source of false positives.
A Gini coefficient on the commit distribution separates broad-team acceleration (Gini < 0.30, the strongest predictor) from a single hero developer carrying the team (Gini > 0.70, mostly noise). Acceleration alone is half a signal — acceleration plus contributor breadth is the full signal.
A composite score collapses information. Code-Side Sourcing classifies each breakout into one of four signal types — Engineering Hiring Burst, Infrastructure Buildout, Deploy Frequency Spike, Framework Migration — because each carries a distinct fundraise-lead-time distribution. The classification is the prediction; the score is just the headline.
If the methodology can’t be published, the buyer can’t verify it, can’t reproduce it, can’t argue with it. Code-Side Sourcing requires the methodology to be a published document — SSRN preprint, Zenodo dataset, regression code under CC BY 4.0 — so any buyer can audit the math before they trust the call.
The full mechanism — the five formal steps that implement these principles in code — is at /mechanism. The proof panel is at /research.
04 · Who practises Code-Side Sourcing
Reference implementation
The site you’re reading. Live dashboard, weekly Acceleration Watch, SSRN-published methodology, 219-startup proof panel, free MCP server. The category and the reference implementation share an author — see /mechanism for the formula and /research for the panel.
Adjacent practitioners
Several quantitative venture funds run internal GitHub-momentum models against public APIs. Most don’t publish the methodology, which keeps the work outside the formal Code-Side Sourcing category — but the underlying activity is the same. Public categorisation lifts the floor for everyone.
Component vendors
Tools that surface contributor activity (CHAOSS, OpenSSF Scorecard, GitHub’s own Insights) feed components of the methodology but don’t close the loop into a sourcing channel. They are inputs to a Code-Side Sourcing pipeline, not the pipeline itself.
Latent practitioners
If a fund tells you their associate “watches GitHub for the team” or that they noticed a portfolio company’s ramp two months before the round, they’re practising Code-Side Sourcing without the name. Naming the category turns informal practice into a repeatable system.
05 · The reference implementation, rung by rung
Each rung is a different operational form of Code-Side Sourcing. Most readers start at the free Acceleration Watch and ascend based on cadence, not feature.
Free, every Sunday
Five named startups ranked by 14-day commit-velocity acceleration. The on-ramp for any practitioner of the category — most funds adopt the rhythm before they adopt the dashboard.
€9.97/mo founding
109+ ranked orgs across 19 sectors, refreshed Mondays 06:00 UTC. Sector filters, watchlists, the full acceleration ranking — the operational surface for a fund running Code-Side Sourcing as a weekly motion.
€1,997 once per sector
A 40-page written deep-dive on one sector with the top-25 ranked orgs, contributor maps, three pre-Crunchbase breakouts. The artefact a partner takes into an IC discussion.
€7 once
The €7 tripwire — pick one sector, get a sector PDF in 24 hours. The cheapest way to feel the signal quality on a sector you already know before adopting the rhythm.
Free agent layer
Six read-only tools inside Claude / Cursor / Windsurf. The agent surface lets any LLM-driven workflow call the same dataset without a screen — the agent-native form of Code-Side Sourcing.
06 · FAQ
Code-Side Sourcing is a sub-category of alternative data, narrowed to engineering-side public repository activity. Most alternative-data products in venture capital surface late-stage growth signals (web traffic, app downloads, enterprise hiring). Code-Side Sourcing surfaces early-stage technical signals — the segment where alt-data has historically been weakest because the consumer surface doesn’t exist yet.
No, it runs in parallel. A fund running Code-Side Sourcing as a sourcing channel still uses warm intros for diligence, founder reference checks, and the actual investment conversation. The difference is the channel that surfaces the company in the first place — Code-Side Sourcing extends the funnel one step earlier than the warm-intro layer can reach.
That’s the interquartile range of the SSRN-published panel of 219 venture-backed startups across five quarterly periods. The median is 31 days. The signal arrives earlier than 21 days for 25% of the panel and later than 47 days for another 25%. The full distribution and methodology are published at /research and ssrn.com/abstract=6606558.
Funds and angels investing in technical companies — AI infrastructure, developer tools, technical SaaS, open-source-led businesses, infrastructure software. The category is least useful for B2C consumer-facing companies (where app stores and web traffic are stronger leading indicators) and Series-D+ growth investing (where the engineering ramp is no longer the marginal information).
Yes. The methodology is published under CC BY 4.0 — anyone can build their own pipeline. The SSRN paper, Zenodo dataset, and regression code are all open. The reason most practitioners subscribe is operational: building and maintaining the pipeline is engineering work that competes with investment work, and €9.97/mo is materially cheaper than the engineer-hour cost of self-hosting it.
The data is public. The category is defensible by methodology, panel proof, distribution channel, and operational rhythm — none of which copy from a Wikipedia search. The same logic that makes quantitative finance defensible (everyone can read 10-Ks; not everyone can build the model) applies to Code-Side Sourcing.
Where to go from here
Start at the free Sunday digest. If the cadence fits the way you source, the rest of the ladder is two clicks away.