ML Testing

467

AI tools in the ML Testing category

All (467)MCP Servers (10)Skills (451)Agents (6)

@chainsafe/benchmark

wemeetagain

> This is an independently maintained fork of [@dapplion/benchmark](https://github.com/dapplion/benchmark). This repo now maintains it's own versioning as `@chainsafe/benchmark` and release schedule. It was forked from the base of `@dapplion/benchmark@1

...more

SkillML Testing

1 dir

eslint-plugin-vitest-globals

saqqdy

A extends of vitest globals for eslint

SkillML Testing

171 dir

claw-harness

GitHub Actions

Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.

SkillML Testing

1 dir

@codspeed/core

adriencaccia

The core Node library used to integrate with Codspeed runners

SkillML Testing

261 dir

@machinespirits/eval

lmagee

Evaluation system for Machine Spirits tutor - benchmarking, rubric evaluation, and analysis tools

SkillML Testing

1 dir

jest-plugin-set

negativetwelve

Declarative JS tests with lazy evaluation using jest.

SkillML Testing

1071 dir

browserless

kikobeats

The headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.

...more

SkillML Testing

1.8K1 dir

@kodus/agent-readiness

gamalinosqui

Evaluate how prepared your codebase is for autonomous AI coding agents

SkillML Testing

1 dir

@agentshield-ai/openclaw-plugin

markbriers

AgentShield real-time security evaluation plugin for OpenClaw. Intercepts tool calls before execution and evaluates them against Sigma detection rules.

...more

SkillML Testing

11 dir

agentv

christso

CLI entry point for AgentV

SkillML Testing

111 dir

@mankinds/sdk

mankinds

TypeScript SDK for Mankinds AI Evaluation API

SkillML Testing

1 dir

mongodb-assistant-eval

nlarew

Evaluation library for the MongoDB Assistant API.

SkillML Testing

1 dir

skillscore

joeynyc

A CLI tool that evaluates AI agent skills and produces quality scores

AgentML Testing

21 dir

odor

catpea

Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.

...more

AgentML Testing

1 dir

faceoff

jdmarshall

Compare performance across multiple versions of your code

SkillML Testing

11 dir

@llmbench/cli

dfbustosus

Evaluate, compare, and benchmark LLMs from your terminal

SkillML Testing

11 dir

ts-benchmark

mohammad-_-ahmad

A command line interface for monitoring the performance of typescript.

SkillML Testing

11 dir

@versatly/skillbench

g9pedro

CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements

SkillML Testing

1 dir

@2501-ai/cli

zhuk-aa

[![npm version](https://img.shields.io/npm/v/@2501-ai/cli.svg)](https://www.npmjs.com/package/@2501-ai/cli) [![HumanEval Score](https://img.shields.io/badge/HumanEval-96.95%25-brightgreen.svg)](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic

...more

SkillML Testing

1 dir

probeai

k08200

CLI tool for testing and evaluating AI coding agents

SkillML Testing

11 dir