47 Alternative Data Sources for Angel Investors in 2026
Most angel investors check 3 sources. Here are 47 signals that catch startups 6-12 weeks before Crunchbase, from GitHub velocity to SEC Form D filings.
Key Takeaway
Most angel investors only check Crunchbase, LinkedIn, and the pitch deck. All three are lagging indicators. This guide covers 47 alternative data sources across 10 categories, each with an access note, typical lead time, and a starting URL. The categories cover public code signals, hiring, product telemetry, infrastructure buildouts, search and attention, funding precursors, community growth, regulatory filings, open-data scientific signals, and niche commercial APIs. Use this as a reference for building a sourcing stack that surfaces breakout companies 6-12 weeks before they hit a VC database.
Most angel investors only check 3 data sources: Crunchbase, LinkedIn, and the pitch deck. The problem: those are all lagging. By the time a startup shows up there, the signal is already priced into the round.
Here are 47 alternative data sources that surface breakout startups 6-12 weeks before they hit a VC database. We publish our methodology at ssrn.com/abstract=6606558.
**Table of contents**
- Public code signals (6)
- Hiring and team signals (5)
- Product telemetry (5)
- Infrastructure buildouts (5)
- Search and attention (5)
- Funding leading indicators (4)
- Community signals (3)
- Regulatory and IP (4)
- Open-data scientific (5)
- Niche commercial APIs (5)
1. Public code signals#
The earliest signal of startup momentum is usually in public code. A team shipping fast leaves a timestamped trail that is hard to fake and easy to query.
**1. GitHub commit velocity, contributors, stars.** Track the rate of change in 14-day commit counts across a company's public repositories, not absolute output. A doubling in two weeks typically means a hiring burst or a product sprint. Contributor graph growth confirms headcount expansion; star velocity on a founder's personal repo often flags dev-tool launches weeks before Product Hunt. Access: free REST and GraphQL APIs with 5,000 requests per hour for authenticated users. Lead time: 6-12 weeks. Start at docs.github.com/en/rest.
**2. GitLab public project activity.** GitLab hosts a smaller but non-overlapping population, especially European dev-tool and infra companies. The activity API returns push events, issue creation, and merge requests with timestamps. Access: free API, public-project scope is open. Lead time: 4-10 weeks. Start at docs.gitlab.com/ee/api/events.html.
**3. npm package downloads.** Weekly download counts for a startup's published packages correlate with developer adoption. A 3x week-over-week jump on a niche package often precedes a Series A. Access: free downloads API. Lead time: 4-8 weeks. Start at api.npmjs.org/downloads.
**4. PyPI download statistics.** Python equivalents via BigQuery's public dataset, which exposes per-package daily downloads since 2016. Useful for ML, data, and scientific-tooling startups. Access: free via Google BigQuery (first 1TB per month is free). Lead time: 4-8 weeks. Start at pypistats.org.
**5. Docker Hub image pulls.** Public image pull counts reveal production adoption. A containerized product crossing 10k monthly pulls is usually past proof-of-concept. Access: pull counts visible on the public page; batch via the unofficial API. Lead time: 3-6 weeks. Start at hub.docker.com.
**6. Release cadence on GitHub Releases.** The gap between tagged releases is a clean proxy for shipping velocity. A move from monthly to weekly releases usually means a team scale-up. Access: free, part of the GitHub REST API. Lead time: 4-8 weeks. Start at docs.github.com/en/rest/releases.
2. Hiring and team signals#
Headcount moves before revenue, and hiring intent shows up in public job boards weeks before offers are signed.
**7. LinkedIn employee count over time.** Track the delta in headcount and function mix. A jump in senior engineering hires is a classic pre-Series-A signal. Access: free via company pages; scraping is against ToS, so use the LinkedIn Sales Navigator API or a licensed provider. Lead time: 4-8 weeks. Start at linkedin.com/company.
**8. Hacker News Who Is Hiring threads.** Monthly thread where YC and non-YC startups post roles directly. A company appearing for the first time, especially with multiple roles, signals fresh capital. Access: free, use the HN Algolia API. Lead time: 3-6 weeks. Start at hn.algolia.com/api.
**9. AngelList Talent job postings.** Wellfound (formerly AngelList Talent) still surfaces early-stage roles with salary and equity ranges. The equity band often hints at the company's stage more clearly than the pitch deck. Access: free browsing, paid API. Lead time: 3-6 weeks. Start at wellfound.com.
**10. Gem.com outbound hiring signals.** Gem aggregates anonymized recruiter outbound activity. A spike in recruiter outreach for a company or from its recruiters is a compound signal: hiring, capital, and sector heat. Access: paid API, designed for recruiting teams. Lead time: 4-6 weeks. Start at gem.com.
**11. Indeed and general job-board velocity.** Indeed's free search surfaces the same roles across the long tail of job sites. The week-over-week change in posted-role count per company is noisy but useful for cross-checks. Access: free search, paid API. Lead time: 2-4 weeks. Start at indeed.com.
3. Product telemetry#
Public product surfaces leak more than teams realize. Launch platforms, tech stacks, and traffic estimators all give early reads on adoption.
**12. Product Hunt launches.** Even failed PH launches tell you the founder has shipped and is running distribution. The velocity of a maker's past PH launches often indicates pace. Access: free GraphQL API. Lead time: 2-6 weeks. Start at api.producthunt.com.
**13. BuiltWith tech stack changes.** BuiltWith logs the JavaScript, CMS, analytics, and payment stack of every crawled domain over time. A move from Stripe to a custom billing stack usually means enterprise pivot. Access: free for single-domain lookups, paid API for bulk. Lead time: 4-8 weeks. Start at builtwith.com.
**14. Similarweb traffic estimates.** Directional traffic data that is noisy below 5k monthly visits but solid above. A 3x month-over-month lift on a product subdomain is almost always real growth. Access: free tier with limits, paid API. Lead time: 4-6 weeks. Start at similarweb.com.
**15. Wappalyzer tech detection.** Browser extension and API that fingerprints frontend and backend stacks. Useful to spot when a startup migrates from a no-code stack to a custom build, which usually means Series A traction. Access: free extension, paid API. Lead time: 4-8 weeks. Start at wappalyzer.com.
**16. App Annie, now data.ai, rank shifts.** Mobile-app App Store and Play Store rank data. A consumer app breaking into a category top-50 is visible 1-2 months before mainstream coverage. Access: free web views, paid API. Lead time: 4-6 weeks. Start at data.ai.
4. Infrastructure buildouts#
The pipes a startup builds are often the clearest sign of ambition. Infrastructure commitments precede revenue by months.
**17. AWS IP blocks and service footprint.** AWS publishes its IP ranges as JSON. Cross-referencing against Passive DNS reveals when a company spins up new regions, which is a capital-commitment signal. Access: free JSON feed, Passive DNS tools are freemium. Lead time: 6-10 weeks. Start at ip-ranges.amazonaws.com/ip-ranges.json.
**18. DNS record changes.** SecurityTrails, DNSDumpster, and RiskIQ all let you diff a company's DNS records over time. New subdomains like api.v2, eu.company.com, or enterprise.company.com all hint at product direction. Access: freemium across providers. Lead time: 4-8 weeks. Start at securitytrails.com.
**19. SSL certificate transparency logs.** Every TLS cert issued is logged publicly and searchable. New cert for a stealth domain owned by a founder is a classic stealth-launch signal. Access: free via crt.sh. Lead time: 6-12 weeks. Start at crt.sh.
**20. Cloudflare RADAR.** Free dashboard of DNS, HTTP, and attack trends at the network level. The per-domain traffic tab gives directional traffic and is harder to game than Similarweb. Access: free. Lead time: 2-6 weeks. Start at radar.cloudflare.com.
**21. HTTP Archive.** Bi-monthly crawl of the top-million sites with full waterfall data. Useful to spot when a startup cleans up its frontend or adds a CDN, both common pre-launch moves. Access: free, data in BigQuery public datasets. Lead time: 4-8 weeks. Start at httparchive.org.
5. Search and attention#
Attention is the oldest leading indicator. Search volume, forum momentum, and preprint citations all show demand before revenue.
**22. Google Trends.** Free normalized search-volume data for any term or brand. A rising 90-day chart for a startup's name, ahead of its sector, usually means the founder has figured out distribution. Access: free web UI, unofficial APIs. Lead time: 2-6 weeks. Start at trends.google.com.
**23. Reddit post velocity.** Reddit's search API exposes post and comment counts per subreddit per keyword over time. A startup getting organic r/selfhosted or r/ClaudeAI mentions is often 1-2 months ahead of its PR cycle. Access: free API with rate limits. Lead time: 3-6 weeks. Start at reddit.com/dev/api.
**24. Hacker News score curves.** The HN Algolia API returns every submission with its final score and comment count. A Show HN that crosses 200 points in a niche is a durable signal, especially for dev-tool companies. Access: free. Lead time: 2-8 weeks. Start at hn.algolia.com/api.
**25. arXiv citation lift.** Semantic Scholar and OpenAlex both expose citation counts per paper per month. A preprint that triples citations in 30 days often accompanies a spinout. Access: free via Semantic Scholar API. Lead time: 8-16 weeks. Start at api.semanticscholar.org.
**26. Substack subscriber growth.** Founders who run newsletters expose subscriber counts on their public pages. A jump from 5k to 20k over a quarter is a distribution signal that usually precedes a product launch. Access: free web scraping, ToS permitting. Lead time: 4-8 weeks. Start at substack.com.
6. Funding leading indicators#
Regulatory filings are the closest thing to a legally required fundraise announcement, and they are free.
**27. SEC Form D filings.** Every US private offering must file Form D within 15 days of first sale. The free EDGAR full-text search lets you query by issuer, promoter, or amount. Access: free. Lead time: 4-8 weeks before press coverage. Start at efts.sec.gov/LATEST/search-index.
**28. EDGAR full-text search.** Beyond Form D, EDGAR indexes S-1, 10-K, and 8-K filings from public companies that often disclose acquisitions, investments, or partnerships with private startups. Access: free. Lead time: 2-6 weeks. Start at efts.sec.gov/LATEST/search-index.
**29. Companies House UK.** Free register of every UK company, with director changes, charges, and confirmation statements. A new Series A usually shows up as a charge registered against new preference shares. Access: free API. Lead time: 2-4 weeks. Start at find-and-update.company-information.service.gov.uk.
**30. French INPI.** France's national company register, equivalent to Companies House, with capital increases and articles of incorporation. Free access via data.inpi.fr. Access: free. Lead time: 2-4 weeks. Start at data.inpi.fr.
7. Community signals#
A startup's community growth rate is sometimes the only metric the founder cares about, and it is often publicly visible.
**31. Discord server growth.** Public Discord servers expose member counts via server invites. Track weekly counts; a 2x jump in a month often signals a launch. Access: free, use the Discord API with a bot token. Lead time: 2-6 weeks. Start at discord.com/developers/docs.
**32. Slack community join rate.** Many dev-tool startups run Slack communities with public invite links. Some expose member counts directly; for others, a simple periodic join-ping tracks the count. Access: free via the workspace's own public signals. Lead time: 3-6 weeks. Start at slack.com/community.
**33. Telegram subscriber velocity.** Public Telegram channels show subscriber counts directly. Track the weekly delta; in crypto and consumer categories, a rising channel often precedes a token or app launch. Access: free via the Telegram Bot API. Lead time: 2-4 weeks. Start at core.telegram.org/bots/api.
8. Regulatory and IP#
Patent, clinical, and regulatory filings are the most underrated signal class. They are legally dated, structured, and free.
**34. USPTO patent filings.** Full-text search of every US patent and application, including inventor and assignee. A startup assigning three patents in six months is either pre-IPO or pre-acquisition. Access: free via PatentsView API. Lead time: 12-24 weeks. Start at patentsview.org/apis.
**35. FDA 510k clearances.** Medical-device clearances are publicly searchable. A 510k clearance is often the trigger for a Series A or B in medtech. Access: free. Lead time: 4-12 weeks. Start at fda.gov/medical-devices/510k-clearances.
**36. ClinicalTrials.gov.** Every US-regulated clinical trial is registered with sponsor, phase, and primary endpoint. A new Phase 2 trial from a private biotech is a capital-event leading indicator. Access: free API. Lead time: 8-16 weeks. Start at clinicaltrials.gov/data-api.
**37. EU MDR database.** EUDAMED tracks every medical device in the EU market. Registration of a new device by a private company often precedes EU commercial launch by a quarter. Access: free web search. Lead time: 6-12 weeks. Start at ec.europa.eu/tools/eudamed.
9. Open-data scientific#
Science happens in public now. Preprints, dataset releases, and model uploads all reveal what research-heavy startups are actually building.
**38. SSRN papers.** Social Science Research Network hosts working papers across economics, finance, and legal research. A startup's founding team publishing on SSRN often indicates the academic anchor for the thesis. Access: free. Lead time: 12-24 weeks. Start at ssrn.com.
**39. bioRxiv preprints.** The main biology preprint server. A startup's scientific advisors publishing a preprint in the company's domain is often the first public trace of the thesis. Access: free API. Lead time: 12-26 weeks. Start at api.biorxiv.org.
**40. medRxiv preprints.** The clinical medicine counterpart to bioRxiv. Useful for surfacing digital-health and clinical-decision startups before they incorporate. Access: free. Lead time: 12-26 weeks. Start at medrxiv.org.
**41. OpenAlex index.** Open-data replacement for Microsoft Academic Graph, with every scholarly work indexed and citation graphs exposed. Useful for tracking a founder's publication trajectory. Access: free API. Lead time: 12-24 weeks. Start at openalex.org.
**42. Hugging Face models and datasets.** Every public model or dataset on HF has a timestamped upload and download history. A startup releasing a flagship open model is usually doing distribution before a product launch. Access: free API. Lead time: 4-12 weeks. Start at huggingface.co/api.
10. Niche commercial APIs#
These are paid or semi-paid services that aggregate several of the above into investor-ready feeds. Only worth the cost once you are sourcing at scale.
**43. Crunchbase API.** The standard, though lagging. Still useful as a baseline and for entity disambiguation across other sources. Access: paid API, limited free tier via the website. Lead time: 0-2 weeks. Start at data.crunchbase.com/docs.
**44. Specter API.** Specter aggregates employee count, web traffic, funding, and tech-stack signals into a single company record. Good for mid-stage sourcing. Access: paid. Lead time: 2-6 weeks. Start at tryspecter.com.
**45. Synaptic.** Focused on consumer and mobile, pulls app-store data, web traffic, and paid-marketing spend into a unified feed. Access: paid, enterprise pricing. Lead time: 2-6 weeks. Start at synaptic.com.
**46. Predictleads.** Monitors company websites for job postings, technology changes, and press releases. Designed for sales teams but useful as a funding-intent feed. Access: paid API. Lead time: 2-4 weeks. Start at predictleads.com.
**47. 4Degrees.** CRM-native signal layer that pipes relationship and news data into a deal-flow pipeline. More useful to partners managing a portfolio than to solo angels. Access: paid. Lead time: 2-4 weeks. Start at 4degrees.ai.
Putting it together#
No solo angel will monitor all 47 in real time. The point is to pick a subset that matches your sector and stage.
If you invest in dev tools, AI, data infrastructure, or developer-facing SaaS, categories 1 through 4 cover most of the public-code surface. That is the subset GitDealFlow aggregates across 20 sectors, scored and ranked weekly. The Prediction Game turns it into a public track record you can share.
For biotech, medtech, and research-heavy verticals, categories 8 and 9 give the longest lead time. For consumer, categories 3 and 5 move fastest.
Our methodology for using the first four categories to predict fundraises is published on SSRN at ssrn.com/abstract=6606558, with the underlying dataset open on Hugging Face. The single most important rule: measure change from a company's own baseline, not absolute output. That filters out the docs sprints, the CI noise, and the popularity that does not convert into product.