ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。
Cross-referenced across 55 tracked directories
#100
Popularity Rank
2 / 55
Listed In
Growing
Adoption Stage
6/4/2023
Created
5,815
GitHub Stars
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository, not just the README.
A Challenging, Contamination-Free LLM Benchmark.
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
...moreaims to track, rank, and evaluate LLMs and chatbots as they are released.
234
Forks
13
Open Issues
4/3/2026
Last Commit
Gaining traction in the ecosystem
Cross-Posting Opportunities
Could also be listed in these directories: