What the Context Window Holds
Every time your AI assistant uses an MCP server, the tool call and its result go into the context window. A database query result might be 2,000 tokens. A file read might be 5,000 tokens. A web search result might be 3,000 tokens. After five or six tool calls, you've consumed 20,000+ tokens of context just on tool results, before counting the conversation itself.
The context window is finite. When it fills up, older content gets pushed out or summarized. If the model loses access to an earlier tool result that it needs for a later step, the quality of its reasoning degrades. This is one reason why agents struggle with long multi-step tasks.
Bigger Windows Help, But Cost More
Models with larger context windows (100K+ tokens) can hold more tool results simultaneously. This means they can reference earlier findings while processing new information, which improves reasoning quality for complex tasks. A model that can hold all its research in context at once produces better synthesis than one that has to work from summaries of earlier steps.
But larger contexts cost more. API pricing is typically per-token, and a model processing 100K tokens of context costs significantly more than one processing 10K tokens. For agents that make many tool calls, the context cost can dominate the total expense.
Practical Strategies
Ask your MCP servers to return concise results. A database query that returns 100 rows when you only need the top 10 wastes context space. Many servers support result limiting, and when they don't, your prompts can ask the model to request limited results.
For long-running tasks, consider having the agent summarize intermediate results before continuing. This compresses earlier tool results into shorter summaries, freeing context space for the next steps. It's not perfect (summaries lose detail), but it's better than losing context entirely.
Choose your model's context size based on your typical task complexity. Simple lookups and quick questions work fine with smaller contexts. Complex research and multi-step analysis benefit from larger windows. Matching context size to task complexity optimizes the cost-quality tradeoff.
Related Reading
- The Cost Economics of Running AI Agents
- Memory Systems in AI Agents: Short-Term vs Long-Term
- What Nobody Tells You About MCP Server Latency
Search 137,000+ AI tools on Skillful.sh. Browse MCP servers.