Build an AI Agent That Writes and Runs Tests

The Testing Agent Loop

The basic loop is straightforward: the agent reads your code, writes a test for it, runs the test, checks whether it passed, and if it failed, reads the error output and fixes the test. This loop is exactly the kind of repetitive, iterative work that agents handle well. You don't need anything exotic to build it. A file system MCP server and a terminal MCP server are enough to get started.

The agent reads source files through the file system server, writes test files to the same location, and executes the test runner through the terminal server. It reads the output, decides if the test passed, and takes the next step accordingly.

Connecting the Right Tools

You'll want at minimum two MCP servers: one for file access and one for command execution. The file server lets the agent read source code and write test files. The command server lets it run your test framework (Jest, pytest, Go test, whatever you use). Some setups add a third server for accessing documentation or code search, which helps the agent understand the codebase context better.

The command server is where you need to be careful with permissions. The agent needs to run test commands, but you probably don't want it running arbitrary shell commands. Scope it down to your test runner and related tools. Permission scoping matters here more than in most agent setups because command execution is inherently powerful.

Writing Good Tests, Not Just Tests

The hard part isn't getting the agent to produce test files. It's getting it to produce useful tests. Without guidance, agents tend to write tests that are either trivially obvious (testing that a function returns what it returns) or overly coupled to implementation details (breaking whenever you refactor).

Provide examples of your team's testing style. Point the agent at your existing tests and say "write tests that follow this pattern." Include your testing guidelines in the agent's context. The better the agent understands what good tests look like in your codebase, the more useful its output will be.

The Iteration Step

When a test fails, the agent reads the failure output and decides whether the test is wrong or the code is wrong. This is where things get interesting. If the test is wrong (a bad assertion or a wrong assumption about behavior), the agent fixes the test and re-runs it. If the code has a bug, the agent can flag it for review or, if you trust it enough, propose a fix.

Start with the agent only fixing its own tests. Let it iterate on test failures that are its own mistakes. Once you're confident in that loop, you can gradually expand to having it suggest code fixes too. Building trust with testing approaches that work takes incremental steps.

How to Build an AI Agent That Writes and Runs Tests

The Testing Agent Loop

Connecting the Right Tools

Writing Good Tests, Not Just Tests

The Iteration Step

Related Reading