AI & Machine Learning · sub-niche
On-device LLM runtimes.
Privacy, latency, cost — three reasons every app eventually wants a 3-8B model running on the user's machine.
Why now
Apple Silicon and consumer GPUs are now fast enough. The runtime that ships the cleanest mobile + desktop + browser experience wins the long tail of privacy-bound apps.
What the signal looks like
Repos with C++/Rust core, contributor list of WebGPU/Metal/CUDA specialists, and benchmarks against llama.cpp in the README.
Public examples
We name publicprojects + categories only — never founders we track inside the paid product. The buyer’s edge stays inside the product.
- llama.cpp forks with WebGPU bindings
- MLX-based mobile runtimes
- ONNX Runtime extensions for new architectures
What this displaces
Cloud-hosted inference for tasks that don't need it (autocomplete, redaction, transcription).
Our build-vs-invest call
Heavy lift to build, but the moat compounds — every supported architecture and platform combo adds defensibility. Fund teams with prior systems experience (compiler, kernel, graphics). Don't fund teams whose only background is fine-tuning notebooks.
Common questions about this niche
- Isn't llama.cpp already winning?
- For desktop, mostly. Mobile, browser, and embedded are still being decided. There's room for a portable runtime above llama.cpp.
- Is this a feature of the OS?
- Apple and Google will ship their own. But the cross-platform runtime — Mac + Windows + Linux + iOS + Android + WebGPU — is a third-party slot.
- What's the wedge product?
- Usually a developer SDK first, then a consumer app that uses it as proof.
More inside AI & Machine Learning
- LLM eval harnesses — Reproducible eval suites that an AI-native team can drop into CI and trust by lunchtime.
- Agent orchestration frameworks — The 'LangChain for X' slot is still wide open — pick a vertical, ship the runtime, win the wedge.
- Retrieval-augmented search libraries — RAG-as-a-library — bring-your-own embedding, bring-your-own vector store, win on developer ergonomics.
- Fine-tuning tools for non-ML teams — Take fine-tuning out of the notebook. Product teams want to point at JSONL and get a deployable adapter.