>_Skillful
Need help with advanced AI agent engineering?Contact FirmAdapt

ML Testing

473

AI tools in the ML Testing category

@wa008/ui-audit-mcp

GitHub Actions

MCP server for iOS app UI evaluation and testing, powered by idb + xcrun simctl

MCP ServerML Testing
2 dirs

@axlsdk/axl

boulder_midweek

TypeScript SDK for orchestrating Agentic Systems

SkillML Testing
2 dirs

katt

raphaelpor

CLI tool that tests the output of agentic AI tools

SkillML Testing
2 dirs

@dankelleher/mcp-eval

dankelleher

A CLI to evaluate MCP servers performance

MCP ServerML Testing
202 dirs

ashby-mcp

dewierwan

MCP server for Ashby ATS — candidate evaluation workflow

MCP ServerML Testing
2 dirs

meta-prompter-mcp

delexw

A prompt evaluation tool available as both an MCP server and a CLI.

MCP ServerML Testing
241 dir

claw-harness

GitHub Actions

Testing framework for OpenClaw bots. Spin up real agents, load skills, drive multi-turn prompts, and capture results.

SkillML Testing
1 dir

sip-benchmark

chufenghuang

Internal CLI tool for evaluating Sip AI persona agent performance

SkillML Testing
1 dir

@chainsafe/benchmark

wemeetagain

> This is an independently maintained fork of [@dapplion/benchmark](https://github.com/dapplion/benchmark). This repo now maintains it's own versioning as `@chainsafe/benchmark` and release schedule. It was forked from the base of `@dapplion/benchmark@1

...more
SkillML Testing
1 dir

@versatly/skillbench

g9pedro

CLI benchmark system for tracking skill versions, scoring performance, and comparing improvements

SkillML Testing
1 dir

@2501-ai/cli

zhuk-aa

[![npm version](https://img.shields.io/npm/v/@2501-ai/cli.svg)](https://www.npmjs.com/package/@2501-ai/cli) [![HumanEval Score](https://img.shields.io/badge/HumanEval-96.95%25-brightgreen.svg)](https://www.2501.ai/research/full-humaneval-benchmark) [![Lic

...more
SkillML Testing
1 dir

odor

catpea

Static blog generator with parallel encoding, incremental builds, atomic writes, and an AI agent for spellcheck, tagging, summarization, and quality evaluation.

...more
AgentML Testing
1 dir

@satoshibits/doc-lint

satoshibits

Documentation linter that assembles evaluation prompts from concern schemas

SkillML Testing
1 dir

js-chess-engine

josefjadrny

Simple and fast Node.js chess engine with configurable AI and no dependencies

SkillML Testing
1561 dir

skilltest

lsaraiva

The testing framework for Agent Skills. Lint, test triggering, and evaluate your SKILL.md files.

SkillML Testing
1 dir

probeai

k08200

CLI tool for testing and evaluating AI coding agents

SkillML Testing
11 dir

agentv

christso

CLI entry point for AgentV

SkillML Testing
111 dir

@kodus/agent-readiness

gamalinosqui

Evaluate how prepared your codebase is for autonomous AI coding agents

SkillML Testing
1 dir

@agentid-protocol/core

sharifventures

AgentID core SDK - cryptographic identity, manifests, signing, verification, and policy evaluation for AI agents

SkillML Testing
11 dir

skillscore

joeynyc

A CLI tool that evaluates AI agent skills and produces quality scores

AgentML Testing
21 dir