VC Deal Flow Signal uses publicly available GitHub data to identify startups showing unusual engineering momentum. This page explains exactly how we source, process, and rank that data, so investors can evaluate the signal quality before acting on it.
GitHub API v3 is our primary data source. We query the search/repositories endpoint to discover active startup organizations across 20 sector-specific topic clusters (e.g., machine-learning, fintech, cybersecurity). We then pull per-organization data from the stats/commit_activity and contributors endpoints.
Filtering: We exclude large tech companies (Google, Microsoft, Meta, etc.), major open-source foundations, and organizations with patterns inconsistent with venture-backed startups. The goal is to surface companies in the pre-seed through Series B range.
Geography is derived from the GitHub organization profile location field, mapped to broad regions (US, UK, EU, APAC, Canada, LATAM, MENA).
The total number of commits to an organization's most active public repository over a rolling 14-day window. We use GitHub's weekly commit_activity data (52 weeks of history) and sum two consecutive weeks to produce a 14-day figure.
The percentage change in commit velocity compared to the preceding 14-day window. A startup with 40 commits this period and 20 commits last period shows +100% velocity change. This is the primary ranking signal — it measures acceleration, not absolute volume.
The number of unique contributors to the organization's most active repository. Growth is estimated by comparing recent 6-week commit volume to the prior 6-week period. A rising contributor count often signals team expansion — a leading indicator of funding or product-market fit.
The count of public repositories created by the organization in the last 30 days. A burst of new repos often signals infrastructure buildout, new product lines, or framework migrations.
The single most predictive composite in the SSRN panel of 219 confirmed rounds is 14-day commit-velocity acceleration combined with low top-contributor concentration (Gini coefficient under 0.30 over the same 14-day window).
Orgs that meet both conditions are 3.4× more likely to announce a Series A within 60 days than orgs with high acceleration alone. In other words: velocity matters, but the shape of the velocity matters more. A team where one developer is doing 80% of the commits can spike just as hard as a team where eight developers are sharing the load — but only one of those teams looks like a fundraise candidate to a Series A partner.
Source: SSRN preprint abstract=6606558, panel n=219, regression stratified by stage. Lift survives a 90-day extension of the panel (next refresh: Q3 2026).
Each startup is assigned one of four signal types based on which metric is driving the acceleration:
We estimate startup stage from contributor count as a rough proxy for team size: Pre-seed (1–7 contributors), Seed (8–19), Series A/B (20–49), Growth (50+). This is an approximation — not all contributors are employees, and not all employees contribute to public repos.
Data is refreshed weekly (Monday mornings). The pipeline queries GitHub for the latest 52 weeks of commit history, recalculates all metrics, regenerates sector rankings, and rebuilds the site. Each sector page shows rankings for the current quarter and up to four previous quarters.
Private repos are invisible. Some startups keep all or most code in private repositories. Our signal only covers public engineering activity.
Commit volume is not code quality. High commit velocity can reflect rapid feature development, but also refactoring, documentation, or CI/CD noise. We mitigate this by measuring change from baseline rather than absolute counts.
Not investment advice. Engineering acceleration is a leading indicator of traction, not a guarantee of success. Always conduct your own due diligence before making investment decisions.
If you want the investor-facing version of this methodology, start with the definition pages and comparison pages that turn the raw framework into buyer-language, then use the buyer's guide to decide whether the stack actually fits how you source.
Use the method in practice
Methodology tells you how the signal is computed. The next step is deciding how to use it in sourcing, what to compare it against, and how to test it on your own taste before you trust it with real pipeline time. If the evidence is strong enough, the buyer-side question becomes workflow fit, not whether the signal exists at all.
Priority routes
By sector (Q2 2026)
By signal type
By stage
Other entry points
Browse startup rankings across 19 sectors, updated weekly with fresh GitHub data — or jump straight to the pricing page.