ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。
Cross-referenced across 55 tracked directories
#96
Popularity Rank
2 / 55
Listed In
Emerging
Adoption Stage
3/7/2026
First Seen
5,766
GitHub Stars
Score: 100/100
0 dependency vulnerabilities found
Run an AI-powered security scan to analyze this package's source code for vulnerabilities, prompt injection vectors, data exfiltration risks, and behavior mismatches.
Scans fetch actual source code from the GitHub repository, not just the README.
aims to track, rank, and evaluate LLMs and chatbots as they are released.
an evaluation benchmark focused on ancient Chinese language comprehension.
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
...moreAn Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
233
Forks
13
Open Issues
3/22/2026
Last Commit
Recently added to the ecosystem
Cross-Posting Opportunities
Could also be listed in these directories: