ML Testing
472AI tools in the ML Testing category
formualizer
psu3d0
Embeddable spreadsheet engine — parse, evaluate & mutate Excel workbooks in the browser. 320+ functions, Arrow-powered.
claw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
subscript
dy
Tiny expression parser & evaluator
@meridianlabs/inspect-scout-viewer
GitHub Actions
Inspect Scout viewer for evaluation logs.
logicguru-engine
sachinsharmawebdev
Advanced JSON-based rule engine with nested conditions, async evaluation, and flexible action system. Perfect for business rules, workflows, and decision automation.
...more@codspeed/core
adriencaccia
The core Node library used to integrate with Codspeed runners
simple-eval
p0lip
Simple JavaScript expression evaluator
@apple-pie/slice
urisvirott
Slice is a TypeScript-first React UI kit with theme tokens, utility hooks, optional Zustand stores, Storybook docs, and benchmark tooling.
...more@versatly/skillbench
g9pedro
CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements
tinybench
GitHub Actions
🔎 A simple, tiny and lightweight benchmarking library!
@atom8n/n8n-benchmark
atom8n-user
Cli for running benchmark tests for n8n
esbench
kaciras
A modern JavaScript benchmarking library
jest-plugin-set
negativetwelve
Declarative JS tests with lazy evaluation using jest.
@future-agi/sdk
nvjkkartik
We help GenAI teams maintain high-accuracy for their Models in production.
@ghost_agent/core
anthonybautista
Core backend package for GhostAgent extraction.
nairon-bench
_obaid_
AI workflow benchmarking CLI
advanced-prompt-template-lang
hve
A powerful TypeScript template engine designed for AI prompt generation with expression evaluation, conditional logic, and control flow directives.
...moreskillscore
joeynyc
A CLI tool that evaluates AI agent skills and produces quality scores
@agentid-protocol/core
sharifventures
AgentID core SDK - cryptographic identity, manifests, signing, verification, and policy evaluation for AI agents
@index9/mcp
johnwils
Search, inspect, and benchmark 300+ AI models from your editor