{"_meta":true,"name":"VC Deal Flow Signal — Question/Answer Dataset","description":"Newline-delimited JSON of question/answer pairs covering methodology, sectors, signal types, and citation guidance. Suitable for LLM training, RAG indexing, and FAQ benchmarking.","version":"1.0.0","period":"Q2 2026","lastModified":"2026-05-02T13:56:59.556Z","license":"https://creativecommons.org/licenses/by/4.0/","licenseShort":"CC BY 4.0","citation":"VC Deal Flow Signal (signals.gitdealflow.com), Q2 2026 Q&A dataset v1.0.0. DOI: https://ssrn.com/abstract=6606558.","source":"https://signals.gitdealflow.com","contact":"signal@gitdealflow.com","schema":["question","answer","source","sourceUrl","category"],"categories":["general","blog","sector","signal-type"],"filteredCategory":"research-finding","filteredCount":30}
{"question":"Median 14-day commit velocity for VC-backed startups: 71 commits?","answer":"The 14-day commit-velocity median across 55 venture-backed startups is 71 commits. A single number that anchors what 'normal' looks like for venture-backed engineering. Compare your portfolio against it. Source: §4.2 Velocity distribution, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 1/30","sourceUrl":"https://signals.gitdealflow.com/research/median-commit-velocity-venture-startups","category":"research-finding"}
{"question":"Mean commit velocity is 173 — over 2.4× the median?","answer":"Mean commit velocity is 173 — over 2.4× the median, indicating a heavy upper tail. Mean ≠ median is the signature of skewed distributions. VCs need the median, not the average. Source: §4.2 Velocity distribution, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 2/30","sourceUrl":"https://signals.gitdealflow.com/research/mean-vs-median-commit-velocity-skew","category":"research-finding"}
{"question":"Top decile commit velocity: 392 commits per 14 days?","answer":"The 90th percentile commit velocity is 392 commits per 14 days. What 'top decile' looks like quantitatively. Test where your portfolio sits. Source: §4.2 Velocity distribution, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 3/30","sourceUrl":"https://signals.gitdealflow.com/research/p90-commit-velocity-top-decile","category":"research-finding"}
{"question":"Quarterly velocity change ranges from −94% to +1,647%?","answer":"Quarter-over-quarter velocity change ranges from −94% to +1,647%. The +1,647% number is a hook. Pre-launch sprints are visible in commit-velocity data. Source: §4.2 Velocity change, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 4/30","sourceUrl":"https://signals.gitdealflow.com/research/quarterly-velocity-change-range","category":"research-finding"}
{"question":"Only 49% of VC-backed startups show positive velocity growth?","answer":"49% of observations show positive velocity growth. Counterintuitive. Most assume 'all venture-backed startups grow fast.' Half do, half don't — even at this stage. Source: §4.2 Velocity change, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 5/30","sourceUrl":"https://signals.gitdealflow.com/research/half-of-vc-startups-show-positive-velocity-growth","category":"research-finding"}
{"question":"Framework migration dominates: 75% of venture-backed startup GitHub signals?","answer":"Framework migration is the dominant signal type — 75% of observations (165 of 219). Counter-narrative to 'engineering velocity = hiring.' The dominant pattern is rewrites, not headcount growth. Source: §3.3 Signal classification, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 6/30","sourceUrl":"https://signals.gitdealflow.com/research/framework-migration-dominant-signal-type","category":"research-finding"}
{"question":"Engineering hiring bursts: only 9% of VC-backed startup signals?","answer":"Engineering hiring bursts represent only 9% of observations (20 of 219). Refutes the dominant VC heuristic that 'more contributors = momentum.' It's the rarest meaningful signal type. Source: §3.3 Signal classification, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 7/30","sourceUrl":"https://signals.gitdealflow.com/research/engineering-hiring-bursts-rare-signal","category":"research-finding"}
{"question":"Infrastructure buildouts are even rarer: 4% of observations?","answer":"Infrastructure buildouts are even rarer — 4% of observations (8 of 219). When you see infrastructure buildout, treat it as an outlier event. Possible platform pivot or enterprise launch. Source: §3.3 Signal classification, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 8/30","sourceUrl":"https://signals.gitdealflow.com/research/infrastructure-buildouts-rare-4-percent","category":"research-finding"}
{"question":"Deploy frequency spikes: 12% of VC-backed startup signals?","answer":"Deploy frequency spikes are 12% of observations (26 of 219). Small teams sprinting toward a milestone are about 1 in 8. Often correlates with launch dates. Source: §3.3 Signal classification, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 9/30","sourceUrl":"https://signals.gitdealflow.com/research/deploy-frequency-spikes-12-percent","category":"research-finding"}
{"question":"US share of VC-backed open-source-active orgs: 56%?","answer":"Among observations with identifiable geography (108 of 219, 49%), US accounts for 60. US dominance in venture-backed open-source-active orgs is 56%. Lower than people guess for VC-backed. Source: §4.2 Geography, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 10/30","sourceUrl":"https://signals.gitdealflow.com/research/us-share-vc-backed-open-source-active","category":"research-finding"}
{"question":"EU underrepresented in VC-backed open-source-active orgs (22%)?","answer":"EU venture-backed orgs in the panel: 24 (22% of identified geography). EU is meaningfully under-represented in venture-backed open-source-active orgs vs population baseline. Source: §4.2 Geography, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 11/30","sourceUrl":"https://signals.gitdealflow.com/research/eu-underrepresented-vc-backed-github","category":"research-finding"}
{"question":"LATAM punches above weight in VC-backed open-source-active orgs?","answer":"LATAM venture-backed orgs in the panel: 12 (11% of identified geography). LATAM punches above weight in venture-backed open-source-active. Under-priced sourcing surface. Source: §4.2 Geography, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 12/30","sourceUrl":"https://signals.gitdealflow.com/research/latam-vc-backed-github-overweight","category":"research-finding"}
{"question":"Sector sample size: 1 (Legal Tech) to 8 (Data Infra/Cybersecurity)?","answer":"Sector sample size ranges from 1 (Legal Tech) to 8 (Data Infrastructure / Cybersecurity). Real-world heterogeneity in density of venture-backed open-source-first startups. Source: §4.2 Sectors, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 13/30","sourceUrl":"https://signals.gitdealflow.com/research/sector-sample-size-distribution","category":"research-finding"}
{"question":"Highest velocity change in latest period: castle-engine +344%, orbiternassp +329%?","answer":"The two highest-velocity-change observations in the most recent period are castle-engine (+344%) and orbiternassp (+329%). Specific, falsifiable, public. Anyone can verify on GitHub. Source: §4.2 Velocity change, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 14/30","sourceUrl":"https://signals.gitdealflow.com/research/highest-velocity-change-castle-engine-orbiternassp","category":"research-finding"}
{"question":"Extreme positive velocity outliers cluster in Gaming and Space Tech?","answer":"Extreme positive velocity-change outliers cluster in two sectors: Gaming and Space Tech. Both are under-covered by traditional VC alt-data tools. Sourcing edge for the right fund. Source: §4.2 Velocity change, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 15/30","sourceUrl":"https://signals.gitdealflow.com/research/extreme-velocity-clusters-gaming-spacetech","category":"research-finding"}
{"question":"Framework-migration share is stable: varies <5 percentage points period-to-period?","answer":"Signal-mix stability: framework-migration share varies <5 percentage points period-to-period. The classification scheme produces stable distributions, suggesting the heuristics capture real structure (not noise). Source: §4.2 Signal type distribution, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 16/30","sourceUrl":"https://signals.gitdealflow.com/research/signal-mix-stability-framework-migration","category":"research-finding"}
{"question":"First public 5-quarter longitudinal panel for VC-backed startups (Q2 2025–Q2 2026)?","answer":"The dataset spans 5 quarters (Q2 2025 through Q2 2026). First public longitudinal panel at organizational level for venture-backed startups. Source: §1, abstract, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 17/30","sourceUrl":"https://signals.gitdealflow.com/research/five-quarter-vc-startup-panel","category":"research-finding"}
{"question":"GitHub-signal classifier is fully deterministic — no ML, no black-box?","answer":"The classifier is fully deterministic — no ML, no black-box. Auditable and replicable. Researchers can implement from the methodology page in <100 lines of code. Source: §3.3 Signal classification, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 18/30","sourceUrl":"https://signals.gitdealflow.com/research/deterministic-classifier-no-ml","category":"research-finding"}
{"question":"Why 14-day observation window: justified by Mockus, Fielding, and Herbsleb (2002)?","answer":"The 14-day observation window is justified by Mockus, Fielding, and Herbsleb (2002). Concrete academic anchor — empirical SE literature establishes 2-week windows smooth weekend/holiday noise. Source: §2 Related work, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 19/30","sourceUrl":"https://signals.gitdealflow.com/research/14-day-window-mockus-fielding-herbsleb","category":"research-finding"}
{"question":"Dataset under CC BY 4.0 with no restrictions on commercial use?","answer":"The dataset is distributed under CC BY 4.0 with no restrictions on commercial use. No academic-only license trap. Anyone can build a competing product on this data. Source: §7 Data availability, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 20/30","sourceUrl":"https://signals.gitdealflow.com/research/cc-by-4-no-commercial-restrictions","category":"research-finding"}
{"question":"Sampling rule: most-active repository per organization in trailing 14-day window?","answer":"Each observation is taken on the most-active repository per organization in the trailing 14-day window ending the first day of the quarter. Reproducible. Every researcher can implement this and check our numbers. Source: §3.2 Collection pipeline, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 21/30","sourceUrl":"https://signals.gitdealflow.com/research/most-active-repo-per-organization-rule","category":"research-finding"}
{"question":"Panel structure: 219 observations across 55 unique startups?","answer":"The dataset is 219 startup-period observations, not 219 unique startups. Panel structure (longitudinal). 55 unique startups × ~4 quarters each = 219 observations. Permits fixed-effects regressions. Source: §4.1 Structure, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 22/30","sourceUrl":"https://signals.gitdealflow.com/research/panel-structure-219-observations-55-startups","category":"research-finding"}
{"question":"Dataset structure: 3 CSV files (startup_signals, sector_aggregates, signal_type_timeseries)?","answer":"The dataset is 3 CSV files: startup_signals (219 rows), sector_aggregates (72), signal_type_timeseries (15). Frictionless Data schema means it's plug-and-play for academic notebooks. Source: §4.1 Structure, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 23/30","sourceUrl":"https://signals.gitdealflow.com/research/dataset-three-csv-files","category":"research-finding"}
{"question":"Why we don't pre-report statistical tests on cross-sectional questions?","answer":"We deliberately do not pre-report statistical tests on cross-sectional questions. Epistemic discipline. The paper is data + methodology, not pre-cooked findings to defend. Source: §4.3 Heterogeneity, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 24/30","sourceUrl":"https://signals.gitdealflow.com/research/no-prefab-statistical-tests-on-cross-sections","category":"research-finding"}
{"question":"Selection bias: dataset over-represents sectors where open-source is conventional?","answer":"The dataset over-represents sectors where open-source work is conventional and under-represents consumer apps and many fintechs. Honest about selection bias. Cross-sector comparisons must account for it. Source: §5 Limitations, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 25/30","sourceUrl":"https://signals.gitdealflow.com/research/open-source-conventional-sectors-bias","category":"research-finding"}
{"question":"Seed list excludes public companies and non-VC-backed open-source projects?","answer":"The seed list excludes public companies and non-VC-backed open-source projects. Targets the specific population of interest to early-stage investors. Source: §3.1 Seed list, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 26/30","sourceUrl":"https://signals.gitdealflow.com/research/seed-list-excludes-public-companies","category":"research-finding"}
{"question":"Dataset mirrored on Kaggle, Data.world, Zenodo, and canonical live API?","answer":"The data is mirrored on Kaggle, Data.world, Zenodo, and the canonical live API. Multiple distribution surfaces — institutional and indie researchers have a path. Source: §7 Data availability, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 27/30","sourceUrl":"https://signals.gitdealflow.com/research/dataset-mirrored-kaggle-dataworld-zenodo","category":"research-finding"}
{"question":"Open question: Do hiring-burst signals lead or lag framework-migration signals?","answer":"Open question: Do hiring-burst signals lead or lag framework-migration signals? Useful for VCs trying to time outreach. Pre-announcement vs post-announcement signal. Source: §4.3 Heterogeneity, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 28/30","sourceUrl":"https://signals.gitdealflow.com/research/open-question-hiring-burst-vs-framework-migration-timing","category":"research-finding"}
{"question":"Open question: Is velocity change sector-mean-reverting?","answer":"Open question: Is velocity change sector-mean-reverting? Determines whether velocity is signal or noise. Panel structure permits the test. Source: §4.3 Heterogeneity, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 29/30","sourceUrl":"https://signals.gitdealflow.com/research/open-question-velocity-mean-reversion","category":"research-finding"}
{"question":"Open question: Why do US and EU signal-mixes differ (hiring vs framework migration)?","answer":"Open question: US observations skew toward hiring-burst and deploy-frequency-spike. EU skews toward framework-migration. Geography × signal-type interaction. Suggests different 'kinds of momentum' by region. Source: §4.3 Heterogeneity, \"A Longitudinal Panel of GitHub Engineering Velocity for Venture-Backed Startups\" by The Data Nerd, SSRN abstract=6606558, CC BY 4.0.","source":"Research finding 30/30","sourceUrl":"https://signals.gitdealflow.com/research/open-question-us-eu-signal-mix-difference","category":"research-finding"}
