ML Testing
446AI tools in the ML Testing category
@orq-ai/n8n-nodes-orq
GitHub Actions
n8n community node for Orq.ai - AI deployment and prompt management platform
@ai-sdk-tool/parser
GitHub Actions
AI SDK middleware for tool call parsing
@chainsafe/benchmark
wemeetagain
> This is an independently maintained fork of [@dapplion/benchmark](https://github.com/dapplion/benchmark). This repo now maintains it's own versioning as `@chainsafe/benchmark` and release schedule. It was forked from the base of `@dapplion/benchmark@1
...more@2501-ai/cli
zhuk-aa
[](https://www.npmjs.com/package/@2501-ai/cli) [](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic
...moreodor
catpea
Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.
...more@buoy-gg/highlight-updates
lovesworking
Control React DevTools highlight updates feature from your app
js-chess-engine
josefjadrny
Simple and fast Node.js chess engine with configurable AI and no dependencies
eslint-plugin-vitest-globals
saqqdy
A extends of vitest globals for eslint
skillscore
joeynyc
A CLI tool that evaluates AI agent skills and produces quality scores
probeai
k08200
CLI tool for testing and evaluating AI coding agents
agentv
christso
CLI entry point for AgentV
@kodus/agent-readiness
gamalinosqui
Evaluate how prepared your codebase is for autonomous AI coding agents
meta-prompter-mcp
delexw
A prompt evaluation tool available as both an MCP server and a CLI.
@wix/evalforge-types
wix-ci-publisher
Unified types for EvalForge agent evaluation system
sip-benchmark
chufenghuang
Internal CLI tool for evaluating Sip AI persona agent performance
claw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
@versatly/skillbench
g9pedro
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
skilltest
lsaraiva
The testing framework for Agent Skills. Lint, test triggering, and evaluate your SKILL.md files.
@agentid-protocol/core
sharifventures
AgentID core SDK - cryptographic identity, manifests, signing, verification, and policy evaluation for AI agents
mongodb-assistant-eval
nlarew
Evaluation library for the MongoDB Assistant API.