Search
Awesome-LLM-Inference
A curated list of Awesome LLM Inference Paper with codes.
Awesome-LLM-3D
A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.
...moreLLMDatahub
a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
...moreAwesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
LLM4Opt
Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.
...moreawesome-language-model-analysis
This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.
...moreChatbot Arena Leaderboard
a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
...moreOpen LLM Leaderboard
aims to track, rank, and evaluate LLMs and chatbots as they are released.
AlpacaEval
An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
ACLUE
an evaluation benchmark focused on ancient Chinese language comprehension.
BeHonest
A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.
Berkeley Function-Calling Leaderboard
evaluates LLM's ability to call external functions/tools.
CompassRank
CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.
...moreCompMix
a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).
...moreDreamBench++
a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.
...moreFELM
a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).
InfiBench
a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.
...moreLawBench
a benchmark designed to evaluate large language models in the legal domain.
LLMEval
focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.
...moreM3CoT
a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.
...more