ML Testing
467AI tools in the ML Testing category
@chainsafe/benchmark
wemeetagain
> This is an independently maintained fork of [@dapplion/benchmark](https://github.com/dapplion/benchmark). This repo now maintains it's own versioning as `@chainsafe/benchmark` and release schedule. It was forked from the base of `@dapplion/benchmark@1
...moreeslint-plugin-vitest-globals
saqqdy
A extends of vitest globals for eslint
claw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
@codspeed/core
adriencaccia
The core Node library used to integrate with Codspeed runners
@machinespirits/eval
lmagee
Evaluation system for Machine Spirits tutor - benchmarking, rubric evaluation, and analysis tools
jest-plugin-set
negativetwelve
Declarative JS tests with lazy evaluation using jest.
browserless
kikobeats
The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.
...more@kodus/agent-readiness
gamalinosqui
Evaluate how prepared your codebase is for autonomous AI coding agents
@agentshield-ai/openclaw-plugin
markbriers
AgentShield real-time security evaluation plugin for OpenClaw. Intercepts tool calls before execution and evaluates them against Sigma detection rules.
...moreagentv
christso
CLI entry point for AgentV
@mankinds/sdk
mankinds
TypeScript SDK for Mankinds AI Evaluation API
mongodb-assistant-eval
nlarew
Evaluation library for the MongoDB Assistant API.
skillscore
joeynyc
A CLI tool that evaluates AI agent skills and produces quality scores
odor
catpea
Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.
...morefaceoff
jdmarshall
Compare performance across multiple versions of your code
@llmbench/cli
dfbustosus
Evaluate, compare, and benchmark LLMs from your terminal
ts-benchmark
mohammad-_-ahmad
A command line interface for monitoring the performance of typescript.
@versatly/skillbench
g9pedro
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
@2501-ai/cli
zhuk-aa
[](https://www.npmjs.com/package/@2501-ai/cli) [](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic
...moreprobeai
k08200
CLI tool for testing and evaluating AI coding agents