ML Testing
479AI tools in the ML Testing category
@openfeature/ofrep-web-provider
toddbaert
This provider is designed to use the [OpenFeature Remote Evaluation Protocol (OFREP)](https://openfeature.dev/specification/appendix-c).
...more@index9/mcp
johnwils
Search, inspect, and benchmark 300+ AI models from your editor
jbr
rubensworks
Just a Benchmark Runner
consys
fireboltcaster
consys is a flexible tool to evaluate models using generic and readable constraints.
claw-harness
GitHub Actions
Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.
@machinespirits/eval
lmagee
Evaluation system for Machine Spirits tutor - benchmarking, rubric evaluation, and analysis tools
time-span
sindresorhus
Simplified high resolution timing
@sgnl-ai/set-transmitter
sgnl-developer
HTTP transmission library for Security Event Tokens (SET) with CAEP/SSF support
nairon-bench
_obaid_
AI workflow benchmarking CLI
probeai
k08200
CLI tool for testing and evaluating AI coding agents
@dapplion/benchmark
dapplion
Ensures that new code does not introduce performance regressions with CI. Tracks:
@tscircuit/autorouting-dataset-01
seveibar
A set of tscircuit problems to benchmark autorouting (currently 16 circuits in `lib/`).
@react-querybuilder/core
jakeboone02
React Query Builder component for constructing queries and filters, with utilities for executing them in various database and evaluation contexts
...more@openfeature/flipt-web-provider
toddbaert
[Flipt](https://www.flipt.io/) is an open source developer friendly feature flagging solution, that allows for easy management and fast feature evaluation.
...morejest-plugin-set
negativetwelve
Declarative JS tests with lazy evaluation using jest.
react-native-performance
oblador
Measure React Native performance
hypertune
miraan
[Hypertune](https://www.hypertune.com/) is the most flexible platform for feature flags, A/B testing, analytics, and app configuration. Built with full end-to-end type safety, Git-style version control and local, synchronous, in-memory flag evaluation. Op
...morenia-web-eval-agent-mcp
arlanrakh
NIA AI Web Evaluation Agent MCP Server - Autonomous browser testing and debugging
skilltest
lsaraiva
The testing framework for Agent Skills. Lint, test triggering, and evaluate your SKILL.md files.
@radaros/core
bharatbxhipment
Core framework for building AI agents with tools, memory, and multi-model support