AI & Machine Learning · sub-niche
Multimodal RAG stacks.
Text + image + table retrieval — the indexing layer that doesn't yet have a winner.
Why now
GPT-4o and Claude vision are good at single-document Q&A but terrible at large corpus retrieval. The indexing layer for multi-modal corpora is unbuilt.
What the signal looks like
Repos with PDF parsing benchmarks in the README, contributor list of OCR/CV engineers, and growing test fixture directories of real documents (insurance forms, lab reports, contracts).
Public examples
We name publicprojects + categories only — never founders we track inside the paid product. The buyer’s edge stays inside the product.
- ColPali-based document retrieval libraries
- LlamaParse-style PDF chunkers
- Vision-RAG benchmarks with reproducible scoring
What this displaces
Hand-rolled OCR → text → embed pipelines that lose layout context.
Our build-vs-invest call
Build vertical: a stack that wins on legal contracts beats a stack that's mediocre on everything. The defensible asset is the ingest pipeline plus the eval set on real documents.
Common questions about this niche
- Doesn't every LLM vendor ship vision now?
- They ship inference. They don't ship retrieval at scale. That's the gap.
- What's the signal that one stack is winning?
- Same eval set scoring 30%+ better with the same model — the difference is the retrieval, not the model.
- What's the moat?
- The eval set, then the ingest pipeline, then the API stickiness.
More inside AI & Machine Learning
- LLM eval harnesses — Reproducible eval suites that an AI-native team can drop into CI and trust by lunchtime.
- Agent orchestration frameworks — The 'LangChain for X' slot is still wide open — pick a vertical, ship the runtime, win the wedge.
- Retrieval-augmented search libraries — RAG-as-a-library — bring-your-own embedding, bring-your-own vector store, win on developer ergonomics.
- Fine-tuning tools for non-ML teams — Take fine-tuning out of the notebook. Product teams want to point at JSONL and get a deployable adapter.