a lightweight LLM evaluation suite that Hugging Face has been using internally.
Cross-referenced across 55 tracked directories
#315
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
3d
Listed For
Recently added to the ecosystem
Score: 100/100
0 dependency vulnerabilities found
A Challenging, Contamination-Free LLM Benchmark.
An open-source library for evaluating task performance of language models and prompts.
Eval tools by OpenAI.
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.