How many tools should an MCP server have?

There is no universal number, but the heuristic that works in practice is: one tool per distinct user intent, not one per REST endpoint. For @gitdealflow/mcp-signal, that came out to five tools mapping eight endpoints — three endpoints folded into multi-purpose tools, two were renamed for verb-noun clarity. Most teams ship with too many tools, not too few. Every tool in the menu costs the model reasoning bandwidth, costs the user latency, and increases the chance of a wrong-tool selection. Audit by logging what your users actually ask for in plain English, then reverse-engineering the smallest tool surface that covers those intents.

Why does MCP tool naming matter for accuracy?

The model selects tools by matching the user's prompt embedding against each tool description's embedding. When two tools share half their vocabulary — list_startups and get_startup, for example — the confidence between them collapses to a coin flip. Verb-noun names parse better than camelCase boundaries, and when the noun is a word the user actually says (startups, signals, sectors), you get a much cleaner lock. Renaming list_signals to get_startup_signal alone fixed selection on prompts that did not even contain the word signal, because the model parsed startup from context.

What is the MCP tool menu tax?

Every tool you expose adds its full schema to the model's context every single turn — description, parameter list, parameter types, return shape. With terse docstrings, that runs ~600 input tokens per tool. Eight tools is ~5,000 tokens of menu before the user has said anything. The tax is paid in three currencies: input tokens (cost), reasoning bandwidth (accuracy), and time-to-first-token (latency). Cutting tools you do not actually need reclaims all three.

Should each REST endpoint become an MCP tool?

No. REST APIs are designed around resources; MCP tools should be designed around user intents. Most APIs have more endpoints than they have distinct intents. Mapping one-to-one ships the extra endpoints as MCP tools that mostly get confused for one another by the model. The cleaner mental model is: list the conversational intents your users have (what would they say in plain English), then design the smallest tool set that covers them. Implementation detail like single-resource gets and list-with-filter pairs almost always collapses into one tool with optional parameters.

The Data Nerd|Founder & Principal Analyst, VC Deal Flow Signal|2026-04-25

I cut my MCP server from 8 tools to 5 and the hallucinations stopped

Three weeks of tool-count post-mortem on @gitdealflow/mcp-signal. Why REST endpoints aren't user intents, why two of my tool names were costing me selection accuracy, and the data on what changed.

Key Takeaway

Shipped @gitdealflow/mcp-signal with 8 tools mirroring REST endpoints. Watched Claude hallucinate parameters, pick the wrong tool, and stitch together 5-call chains for single-intent prompts. Three weeks of cuts later: 5 tools, two renamed for verb-clarity, selection accuracy from 66% to 98.5%, average tool calls per intent from 2.4 to 1.1. The heuristic: design one tool per user intent, not one per resource. REST endpoints are an implementation detail; intents are what the model selects against.

19 sectors tracked|87 startup signals|Data: Q2 2026|Updated weekly

I shipped @gitdealflow/mcp-signal in two hours. Eight tools, mirrored one-to-one off our REST API. Felt clean. Looked clean. The first time I plugged it into Claude and asked "what's the trending startup in fintech this week," it called `get_startup` with a `sector` parameter that doesn't exist, hallucinated a result, and confidently quoted me numbers nobody had ever calculated.

I wasn't building an MCP server. I was building a very expensive random-number generator with a JSON wrapper.

The next three weeks were a tool-count post-mortem. The end state was 5 tools, two of them renamed for verb-clarity, and a selection accuracy that went from "wrong about a third of the time" to "I genuinely cannot remember the last time it picked the wrong one." Here is what actually mattered.

The 8 tools that didn't work#

The starting menu, copied straight from the REST routes:

``` list_startups get_startup list_signals get_signal get_methodology get_trending list_sectors get_sector ```

Reasonable on paper. Each tool described a real capability. The schemas were valid. The descriptions read like decent docstrings. I was even proud of how cleanly the surface mapped onto the API.

The problem became obvious the first time I watched a real conversation. The user asked something agentic — "show me the top fintech startups this week, and tell me what makes them interesting." A single intent, a single ranked list. Claude did this:

Called `list_sectors` (probably to confirm "fintech" is a valid sector)
Called `list_startups` with a `sector` parameter (does not exist on `list_startups`, schema rejected it)
Retried with a different parameter shape
Eventually gave up and called `get_trending`
Made up a `top_n` parameter that does not exist either
Returned a "this is what is trending" answer that was actually four random startups from cache

Five tool calls for a single intent. Three of them with hallucinated parameters. Zero of them returned the data the user actually wanted.

This is the part nobody talks about when they say "MCP just works." It works in demos because demo prompts map cleanly to one tool. It stops working the moment a user asks something composite — which is most user prompts.

What the model is actually doing#

I had been thinking about MCP tools as endpoints. The model thinks about them as items on a menu it has to read every single turn.

Eight tools means eight schemas in context. Each schema includes a description, a parameter list, parameter types, parameter descriptions, return type, return description. Even with terse docstrings, that runs ~600 tokens per tool. Eight tools ≈ 5,000 tokens of menu before the user has said anything.

Worse, the model has to hold all eight in working memory while it picks one. The picking process is essentially a vector-similarity beauty contest: the user prompt's embedding against each tool description's embedding. If two tools have descriptions that share half their vocabulary — `list_startups` and `get_startup`, say, both heavy on the word "startup" — the model's confidence between them collapses to something close to a coin flip.

Most "the AI hallucinated a tool call" stories I have heard in the last quarter are this exact failure. Not a model failure. A menu design failure.

The cuts#

Three tools got dropped in the first pass. Two more got renamed.

**`list_startups` and `get_startup` were collapsed.** The model was confusing them on every other turn. The tell: when I logged the model's reasoning, it would describe what it wanted as "a list-style get of startups in fintech" — which is actually `list_startups(sector="fintech")`. But it kept calling `get_startup` with a `sector` parameter, because the names were too close.

I killed `get_startup` entirely. If you want a single startup, you call the list tool with a filter and `limit=1`. The single-resource endpoint had been costing me selection accuracy without buying any real capability.

**`list_sectors` and `get_sector` went the same way.** Almost nobody — including the model — wanted a single sector. They wanted a list to pick from, which used to be `list_sectors`'s job. I rolled both into a single tool I named `search_startups_by_sector` — a verb-noun-prepositional-phrase shape that the model parses extremely cleanly. "Find me fintech startups" → unambiguous match.

**`list_signals` got renamed to `get_startup_signal`.** This was the subtle one. The user almost never says "signals" in a prompt — they say "what's the engineering activity look like for X" or "is this team building." The word "signals" is internal jargon. The rename made the model start picking the right tool on prompts that did not even contain the word signal, because it parsed "startup" from context and matched on that.

**`get_trending` got renamed to `get_trending_startups`.** Same idea. Verb-adjective-noun where the noun is a word the user actually said is a much stronger lock than verb-adjective alone.

The 5 that work#

``` get_trending_startups search_startups_by_sector get_startup_signal get_signals_summary get_methodology ```

Two things worth calling out beyond the renames.

**`get_signals_summary` is a new tool.** It does not have a 1:1 REST endpoint. It exists because users kept asking "give me a one-paragraph summary of what's interesting this week" and the model kept stitching together three calls to fake it. I built the summary tool. The model now makes one call.

That last point is the heuristic I would give anyone shipping an MCP server: look at the actual conversational intents your users have, and design one tool per intent. Resources are an implementation detail. Intents are what the model is selecting against.

**Verb-noun, not noun-noun.** Even the kept tools got their names re-checked against this rule. `get_methodology` survived because users do say "methodology" — but if I noticed selection drift I would rename it `describe_signal_methodology` to anchor on the verb.

The data, three weeks later#

I have logging on every tool call. Before the cuts, on a sample of 200 real prompts:

- 132 / 200 (66%) ended in a correct tool selection on the first call - 68 / 200 (34%) involved at least one hallucinated parameter or wrong-tool selection - Average tool calls per user intent: 2.4

After the cuts, on the same kinds of prompts:

- 197 / 200 (98.5%) ended in a correct tool selection on the first call - 3 / 200 (1.5%) involved a wrong-tool selection (all three were obscure edge cases) - Average tool calls per user intent: 1.1

The token cost of menu inflation went from ~5,000 input tokens per turn to ~2,800. Selection accuracy effectively saturated. And — this is the part I underestimated — the model's *latency on the first token* dropped noticeably, because it was no longer chewing through eight schemas before picking one.

The thing I would tell past me#

If I could go back to the morning I shipped the first version, I would tell myself two things.

First: your tool count is a liability, not an asset. Every tool you add costs the model reasoning, costs the user latency, and costs you accuracy. Every tool needs to earn its place by mapping to a distinct user intent that no other tool maps to.

Second: REST API endpoints are not user intents. The clean mental model is: "what would a user say to express this need," not "what HTTP route serves this resource." The mapping is rarely 1:1. Most APIs have more endpoints than they have distinct user intents — ours had eight endpoints and five intents — and shipping the extra three as MCP tools is just paying the menu tax for nothing.

I am at five tools and I think four of them are load-bearing. The fifth (`get_methodology`) only fires maybe 1 in 50 conversations, and I am watching it. If selection accuracy on the other four starts degrading, that is the next cut.

The MCP spec lets you ship as many tools as you want. The model does not reward you for shipping more.

How to use this#

If you are shipping an MCP server, the audit is fast:

Log every tool call from a representative week of real conversations.
Cluster the user prompts by intent in plain English (ignore the tool the model picked).
Count distinct intents.
If your tool count exceeds your intent count, you have menu inflation. Cut to match.

The repo for the five-tool version is at github.com/kindrat86/mcp-deal-flow-signal. The schemas, the descriptions, and the changelog are all there.

If you have shipped an MCP server, what is your tool count and how did you arrive at it? Public reporting on this trade-off is surprisingly thin and I am collecting examples.

The Data Nerd|Founder & Principal Analyst, VC Deal Flow Signal|2026-04-25

I cut my MCP server from 8 tools to 5 and the hallucinations stopped

Three weeks of tool-count post-mortem on @gitdealflow/mcp-signal. Why REST endpoints aren't user intents, why two of my tool names were costing me selection accuracy, and the data on what changed.

Key Takeaway

19 sectors tracked|87 startup signals|Data: Q2 2026|Updated weekly

I wasn't building an MCP server. I was building a very expensive random-number generator with a JSON wrapper.

The 8 tools that didn't work#

The starting menu, copied straight from the REST routes:

``` list_startups get_startup list_signals get_signal get_methodology get_trending list_sectors get_sector ```

Reasonable on paper. Each tool described a real capability. The schemas were valid. The descriptions read like decent docstrings. I was even proud of how cleanly the surface mapped onto the API.

Called `list_sectors` (probably to confirm "fintech" is a valid sector)
Called `list_startups` with a `sector` parameter (does not exist on `list_startups`, schema rejected it)
Retried with a different parameter shape
Eventually gave up and called `get_trending`
Made up a `top_n` parameter that does not exist either
Returned a "this is what is trending" answer that was actually four random startups from cache

Five tool calls for a single intent. Three of them with hallucinated parameters. Zero of them returned the data the user actually wanted.

What the model is actually doing#

I had been thinking about MCP tools as endpoints. The model thinks about them as items on a menu it has to read every single turn.

Most "the AI hallucinated a tool call" stories I have heard in the last quarter are this exact failure. Not a model failure. A menu design failure.

The cuts#

Three tools got dropped in the first pass. Two more got renamed.

**`get_trending` got renamed to `get_trending_startups`.** Same idea. Verb-adjective-noun where the noun is a word the user actually said is a much stronger lock than verb-adjective alone.

The 5 that work#

``` get_trending_startups search_startups_by_sector get_startup_signal get_signals_summary get_methodology ```

Two things worth calling out beyond the renames.

The data, three weeks later#

I have logging on every tool call. Before the cuts, on a sample of 200 real prompts:

- 132 / 200 (66%) ended in a correct tool selection on the first call - 68 / 200 (34%) involved at least one hallucinated parameter or wrong-tool selection - Average tool calls per user intent: 2.4

After the cuts, on the same kinds of prompts:

- 197 / 200 (98.5%) ended in a correct tool selection on the first call - 3 / 200 (1.5%) involved a wrong-tool selection (all three were obscure edge cases) - Average tool calls per user intent: 1.1

The thing I would tell past me#

If I could go back to the morning I shipped the first version, I would tell myself two things.

The MCP spec lets you ship as many tools as you want. The model does not reward you for shipping more.

How to use this#

If you are shipping an MCP server, the audit is fast:

Log every tool call from a representative week of real conversations.
Cluster the user prompts by intent in plain English (ignore the tool the model picked).
Count distinct intents.
If your tool count exceeds your intent count, you have menu inflation. Cut to match.

The repo for the five-tool version is at github.com/kindrat86/mcp-deal-flow-signal. The schemas, the descriptions, and the changelog are all there.

If you have shipped an MCP server, what is your tool count and how did you arrive at it? Public reporting on this trade-off is surprisingly thin and I am collecting examples.

I cut my MCP server from 8 tools to 5 and the hallucinations stopped

The 8 tools that didn't work#

What the model is actually doing#

The cuts#

The 5 that work#

The data, three weeks later#

The thing I would tell past me#

How to use this#

Frequently Asked Questions

How many tools should an MCP server have?

Why does MCP tool naming matter for accuracy?

What is the MCP tool menu tax?

Should each REST endpoint become an MCP tool?

More articles in this series

I Tracked 4,200 Startup GitHub Orgs for Six Months. Here's What Predicts a Series A.

I made my VC deal flow callable by Claude this weekend. Here is what that actually means.

Related Sector Rankings

Developer Tools

AI & Machine Learning

Get this week's top breakout startups

I cut my MCP server from 8 tools to 5 and the hallucinations stopped

The 8 tools that didn't work#

What the model is actually doing#

The cuts#

The 5 that work#

The data, three weeks later#

The thing I would tell past me#

How to use this#

Frequently Asked Questions

How many tools should an MCP server have?

Why does MCP tool naming matter for accuracy?

What is the MCP tool menu tax?

Should each REST endpoint become an MCP tool?

More articles in this series

I Tracked 4,200 Startup GitHub Orgs for Six Months. Here's What Predicts a Series A.

I made my VC deal flow callable by Claude this weekend. Here is what that actually means.

Related Sector Rankings

Developer Tools

AI & Machine Learning

Get this week's top breakout startups