Search

Awesome-LLM-Inference

A curated list of Awesome LLM Inference Paper with codes.

AgentOther Papers

151 dir

Awesome-LLM-3D

A curated list of Multi-modal Large Language Model in 3D world, including 3D understanding, reasoning, generation, and embodied agents.

...more

AgentOther Papers

2.1K1 dir

LLMDatahub

a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset

...more

AgentOther Papers

3.4K1 dir

Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

AgentOther Papers

22K1 dir

LLM4Opt

Applying Large language models (LLMs) for diverse optimization tasks (Opt) is an emerging research area. This is a collection of references and papers of LLM4Opt.

...more

AgentOther Papers

3491 dir

awesome-language-model-analysis

This paper list focuses on the theoretical or empirical analysis of language models, e.g., the learning dynamics, expressive capacity, interpretability, generalization, and other interesting topics.

...more

AgentOther Papers

991 dir

Chatbot Arena Leaderboard

a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.

...more

AgentLLM Leaderboard

1 dir

Open LLM Leaderboard

aims to track, rank, and evaluate LLMs and chatbots as they are released.

AgentLLM Leaderboard

1 dir

AlpacaEval

An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.

AgentLLM Leaderboard

1 dir

ACLUE

an evaluation benchmark focused on ancient Chinese language comprehension.

AgentLLM Leaderboard

331 dir

BeHonest

A pioneering benchmark specifically designed to assess honesty in LLMs comprehensively.

AgentLLM Leaderboard

1 dir

Berkeley Function-Calling Leaderboard

evaluates LLM's ability to call external functions/tools.

AgentLLM Leaderboard

1 dir

CompassRank

CompassRank is dedicated to exploring the most advanced language and visual models, offering a comprehensive, objective, and neutral evaluation reference for the industry and research.

...more

AgentLLM Leaderboard

1 dir

CompMix

a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).

...more

AgentLLM Leaderboard

1 dir

DreamBench++

a benchmark for evaluating the performance of large language models (LLMs) in various tasks related to both textual and visual imagination.

...more

AgentLLM Leaderboard

1 dir

FELM

a meta-benchmark that evaluates how well factuality evaluators assess the outputs of large language models (LLMs).

AgentLLM Leaderboard

1 dir

InfiBench

a benchmark designed to evaluate large language models (LLMs) specifically in their ability to answer real-world coding-related questions.

...more

AgentLLM Leaderboard

1 dir

LawBench

a benchmark designed to evaluate large language models in the legal domain.

AgentLLM Leaderboard

1 dir

LLMEval

focuses on understanding how these models perform in various scenarios and analyzing results from an interpretability perspective.

...more

AgentLLM Leaderboard

1 dir

M3CoT

a benchmark that evaluates large language models on a variety of multimodal reasoning tasks, including language, natural and social sciences, physical and social commonsense, temporal reasoning, algebra, and geometry.

...more

AgentLLM Leaderboard

1 dir