a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
Cross-referenced across 55 tracked directories
#645
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
3/13/2026
First Seen
Recently added to the ecosystem
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
aims to track, rank, and evaluate LLMs and chatbots as they are released.
an evaluation benchmark focused on ancient Chinese language comprehension.