ML Testing
473AI tools in the ML Testing category
@wa008/ui-audit-mcp
GitHub Actions
MCP server for iOS app UI evaluation and testing, powered by idb + xcrun simctl
@axlsdk/axl
boulder_midweek
TypeScript SDK for orchestrating Agentic Systems
katt
raphaelpor
CLI tool that tests the output of agentic AI tools
@dankelleher/mcp-eval
dankelleher
A CLI to evaluate MCP servers performance
ashby-mcp
dewierwan
MCP server for Ashby ATS — candidate evaluation workflow
meta-prompter-mcp
delexw
A prompt evaluation tool available as both an MCP server and a CLI.
claw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
sip-benchmark
chufenghuang
Internal CLI tool for evaluating Sip AI persona agent performance
@chainsafe/benchmark
wemeetagain
> This is an independently maintained fork of [@dapplion/benchmark](https://github.com/dapplion/benchmark). This repo now maintains it's own versioning as `@chainsafe/benchmark` and release schedule. It was forked from the base of `@dapplion/benchmark@1
...more@versatly/skillbench
g9pedro
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
@2501-ai/cli
zhuk-aa
[](https://www.npmjs.com/package/@2501-ai/cli) [](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic
...moreodor
catpea
Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.
...more@satoshibits/doc-lint
satoshibits
Documentation linter that assembles evaluation prompts from concern schemas
js-chess-engine
josefjadrny
Simple and fast Node.js chess engine with configurable AI and no dependencies
skilltest
lsaraiva
The testing framework for Agent Skills. Lint, test triggering, and evaluate your SKILL.md files.
probeai
k08200
CLI tool for testing and evaluating AI coding agents
agentv
christso
CLI entry point for AgentV
@kodus/agent-readiness
gamalinosqui
Evaluate how prepared your codebase is for autonomous AI coding agents
@agentid-protocol/core
sharifventures
AgentID core SDK - cryptographic identity, manifests, signing, verification, and policy evaluation for AI agents
skillscore
joeynyc
A CLI tool that evaluates AI agent skills and produces quality scores