Agent Infrastructure · Startup idea
Unit tests don't work for agents. The eval platform that lets a team define a regression suite over real conversations — with model-graded scoring — is the next category leader.
Why now
Every team shipping agents has homegrown eval scripts. The pattern is the same: capture real traffic, label outcomes, replay against new prompts, compare scores. Nobody productized it well enough.
The idea you could build today
Two surfaces: (1) an SDK that captures any agent run as a recordable trace, (2) a web app where the operator labels traces, defines a model-graded rubric, and re-runs them against any prompt/model change. Pricing per evaluated trace, not per seat.
Build stack
The three repos already trying
AI-Powered Photos App for the Decentralized Web. We are on a mission to protect your freedom and privacy.
Framework migration
+109%
14-day velocity Δ
100 contributors
Framework migration
+92%
14-day velocity Δ
5 contributors
Engineering hiring burst
+55%
14-day velocity Δ
97 contributors
Matched against the current-period startup signal panel (ai-ml, developer-tools). Rankings shift weekly as the underlying GitHub activity moves. Read the methodology.
The seed-round pattern hiding in the trendline
Eval-suite OSS repos with sudden contributor surges around "LLM-as-judge" modules are the seed-round tells. The repos that ship a CI integration in the same window are the strong leads.
They're the leaders. The opportunity is the long tail — the team that doesn't want a $500/mo SaaS and would happily self-host a smaller product. The OSS-first wedge.
Use the signal, not just the idea
The repos above re-rank automatically as commit velocity, contributor growth, and new-repo creation move. Want the data feed for this idea wired into your own stack? The MCP server exposes every signal as a tool any agent host can query.
Updated 2026-05-18. The framing is editorial; the “three repos already trying” slot is generated from the live signal panel. Anonymity rule: we name public GitHub orgs, never individual founders or stealth teams.