Data Infrastructure · sub-niche
Vector database engines.
Vector search engines optimized for specific workloads — high-dimensional, hybrid, or local.
Team-sized buildHot — multiple deals per month
Why now
RAG is now everywhere. The vector DB seat is being decided. Switching costs are forming fast.
What the signal looks like
Repos with benchmarks against Pinecone / Weaviate / Qdrant, SDK adapters in TS + Python, and recall/latency tradeoff dashboards.
Public examples
We name publicprojects + categories only — never founders we track inside the paid product. The buyer’s edge stays inside the product.
- Qdrant / Weaviate forks
- TurboPuffer-style serverless vector DBs
- pgvector + pgvectorscale combos
What this displaces
An overprovisioned Pinecone bill and 200ms p95.
Our build-vs-invest call
Hard to differentiate. The wedge is a specific workload (multi-tenant, low-latency, edge). Fund with prior search infra background only.
Common questions about this niche
- Buyer?
- AI app developers + ML platform teams.
- Pricing?
- $100-10k/mo SaaS or self-hosted.
- Moat?
- Performance + ecosystem + cost.
More inside Data Infrastructure
- Real-time feature stores — Feature stores with sub-second freshness for online ML.
- Postgres extension marketplaces — Postgres is now the AI database. The extension ecosystem is the next platform.
- Columnar warehouse alternatives — Snowflake / BigQuery alternatives optimized for a specific shape — cheap, fast, or open.
- Change data capture tools — CDC pipelines that don't require a Kafka cluster.