Benchmarking Large Language Models
Cross-referenced across 55 tracked directories
#3807
Popularity Rank
1 / 55
Listed In
Emerging
Adoption Stage
5/28/2023
Created
105
GitHub Stars
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository, not just the README.
Jeffrey Ip
The LLM Evaluation Framework
Google, LLC
LLM Comparator: An interactive visualization tool for side-by-side LLM evaluation
Awesome Gen AI Tools: LLM Benchmarks: MMLU, HellaSwag, BBH, and Beyond - Confident AI
Awesome Gen AI Tools: Cleanlab Trustworthy Language Model: Score the trustworthiness of any LLM response
21
Forks
22
Open Issues
6/20/2025
Last Commit
Recently added to the ecosystem