>_Skillful
Need help with advanced AI agent engineering?Contact FirmAdapt
Back to Agents

LLMEval

AgentLLM Leaderboardllmai-appawesome-list

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

Directory Presence

Cross-referenced across 55 tracked directories

DirectoryStatusLink
A
Awesome LLM Apps

Adoption Metrics

#311

Popularity Rank

1 / 55

Listed In

Emerging

Adoption Stage

2d

Listed For

Recently added to the ecosystem

Related Agents

Chinese Llm Benchmark

jeinlee1991

ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括359个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3-max、qwen3.5-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及step3.5-flash、kimi-k2.5、ernie4.5、MiniMax-M2.5、deepseek-v3.2、Qwen3.5、llama4、智谱GLM-5、GLM-4.7、LongCat、gemma3、mistral等开源大模型。不仅提供排行榜,也提供规模超200万的大模型缺陷库!方便广大社区研究分析、改进大模型。

AgentLLM Leaderboard
5.7K2 dirs

Chatbot Arena Leaderboard

a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.

AgentLLM Leaderboard
1 dir

AlpacaEval

An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.

AgentLLM Leaderboard
1 dir

Open LLM Leaderboard

aims to track, rank, and evaluate LLMs and chatbots as they are released.

AgentLLM Leaderboard
1 dir