Data Infrastructure · Startup idea
Every RAG stack ships without an eval suite. The product that makes RAG testable — retrieval precision, answer faithfulness, context relevance — wins the team that's already burning $5K/mo on the wrong embeddings.
Why now
Ragas, TruLens, and DeepEval exist but the DX is rough. The next generation makes eval a 5-minute setup, not a 5-day science project.
The idea you could build today
SDK in TypeScript + Python that wraps any RAG pipeline. Capture retrieved chunks + final answer. Score against an LLM-graded rubric (retrieval precision, faithfulness, context relevance). Show the drift dashboard.
Build stack
The three repos already trying
AI-Powered Photos App for the Decentralized Web. We are on a mission to protect your freedom and privacy.
Framework migration
+109%
14-day velocity Δ
100 contributors
Framework migration
+92%
14-day velocity Δ
5 contributors
Engineering hiring burst
+55%
14-day velocity Δ
97 contributors
Matched against the current-period startup signal panel (ai-ml, developer-tools). Rankings shift weekly as the underlying GitHub activity moves. Read the methodology.
The seed-round pattern hiding in the trendline
RAG-eval OSS repos with velocity in the "LLM-as-judge" module + a TypeScript SDK in the same release are the seed-stage tells.
They're the leaders, all Python-only. The TypeScript-first wedge is open — most agent stacks are TS now.
Use the signal, not just the idea
The repos above re-rank automatically as commit velocity, contributor growth, and new-repo creation move. Want the data feed for this idea wired into your own stack? The MCP server exposes every signal as a tool any agent host can query.
Updated 2026-05-18. The framing is editorial; the “three repos already trying” slot is generated from the live signal panel. Anonymity rule: we name public GitHub orgs, never individual founders or stealth teams.