ML Testing
447AI tools in the ML Testing category
@orq-ai/n8n-nodes-orq
GitHub Actions
n8n community node for Orq.ai - AI deployment and prompt management platform
katt
raphaelpor
CLI tool that tests the output of agentic AI tools
@axlsdk/axl
boulder_midweek
TypeScript SDK for orchestrating Agentic Systems
@wa008/ui-audit-mcp
GitHub Actions
MCP server for iOS app UI evaluation and testing, powered by idb + xcrun simctl
@ai-sdk-tool/parser
GitHub Actions
AI SDK middleware for tool call parsing
@satoshibits/doc-lint
satoshibits
Documentation linter that assembles evaluation prompts from concern schemas
js-chess-engine
josefjadrny
Simple and fast Node.js chess engine with configurable AI and no dependencies
meta-prompter-mcp
delexw
A prompt evaluation tool available as both an MCP server and a CLI.
skillscore
joeynyc
A CLI tool that evaluates AI agent skills and produces quality scores
@2501-ai/cli
zhuk-aa
[](https://www.npmjs.com/package/@2501-ai/cli) [](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic
...moreclaw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
mongodb-assistant-eval
nlarew
Evaluation library for the MongoDB Assistant API.
@wix/evalforge-types
wix-ci-publisher
Unified types for EvalForge agent evaluation system
@llmbench/cli
dfbustosus
Evaluate, compare, and benchmark LLMs from your terminal
odor
catpea
Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.
...moreagentv
christso
CLI entry point for AgentV
probeai
k08200
CLI tool for testing and evaluating AI coding agents
@versatly/skillbench
g9pedro
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
skilltest
lsaraiva
The testing framework for Agent Skills. Lint, test triggering, and evaluate your SKILL.md files.
@kodus/agent-readiness
gamalinosqui
Evaluate how prepared your codebase is for autonomous AI coding agents