Why Caching Matters for MCP Servers
Every tool call through an MCP server has a cost. If the server calls an external API, there's latency, rate limits, and often a per-request charge. If the agent makes the same query multiple times in one session (and they do this more often than you'd expect), you're paying that cost each time for the same result.
A simple cache in front of frequently called tools can cut response times from seconds to milliseconds and reduce API costs significantly. The question isn't whether to cache. It's where, what, and for how long.
Cache at the Right Layer
You can cache at the MCP server level (the server itself caches responses before returning them), at the client level (the agent framework caches tool call results), or at the infrastructure level (a Redis or in-memory cache between the server and the external API). Each has tradeoffs.
Server-level caching is the easiest to implement and the most common. The MCP server keeps a local cache (in-memory or on disk) and returns cached results for identical requests within the TTL window. This works well for read-heavy tools like database queries or API lookups where the data doesn't change every second.
Client-level caching is useful when the same agent session makes repeated identical tool calls. The agent framework can check "did I already call this tool with these exact parameters?" and reuse the previous result without making a new server call at all. This avoids both network overhead and server processing.
Choosing TTL Values
Time-to-live (TTL) depends on how fast the underlying data changes. For a currency exchange rate API, maybe 5 minutes. For a database schema query, maybe an hour. For a documentation lookup, maybe a day. The right TTL balances freshness against performance.
When in doubt, start with a short TTL and extend it once you've confirmed that the data doesn't change frequently enough to cause problems. A 60-second TTL is almost always safe and still eliminates duplicate calls within a conversation.
Cache Invalidation
The hard part of caching is knowing when to throw away cached data. For MCP servers, the simplest approach is TTL-based expiration: cached data expires after a set time, and the next request fetches fresh data. This is easy to implement and works for most cases.
For more dynamic data, you might want event-based invalidation. If the agent writes data through one tool, the cache for related read tools should be cleared. For example, if the agent creates a new record through a database write tool, the cache for list/query tools should be invalidated so the next read reflects the change.
What Not to Cache
Don't cache tool calls with side effects (things that create, update, or delete). Don't cache responses that include time-sensitive data like authentication tokens. And don't cache error responses, because a temporary API failure shouldn't prevent the agent from retrying. Being thoughtful about rate limit handling is part of the same performance picture.