16
Published Tools
616,369
Total Stars
0
Weekly Downloads
3,435
GitHub Followers
42
Public Repos
100/100
Avg Security
Published Tools
15 Skills1 Agentacross 3 categoriesvllm-tpu
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-cpu-amxbf16
vLLM CPU inference engine (AVX512 + VNNI + BF16 + AMX optimized)
vllm-cpu-avx512bf16
vLLM CPU inference engine (AVX512 + VNNI + BF16 optimized)
vllm-cpu-avx512vnni
vLLM CPU inference engine (AVX512 + VNNI optimized)
vllm-cpu-avx512
vLLM CPU inference engine (AVX512 optimized)
vllm-cpu
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-hust
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-omni
vLLM-Omni Team
A framework for efficient model inference with omni-modality models
vllm-sr-sim
vLLM Semantic Router Team
vLLM Semantic Router fleet simulator for capacity planning, SLO validation, and what-if analysis
llm-katan
Yossi Ovadia <[email protected]>
LLM Katan - Lightweight LLM Server for Testing - Real tiny models with FastAPI and HuggingFace
llmcompressor
Neuralmagic, Inc.
A library for compressing large language models utilizing the latest techniques and research in the field for both training aware and post training techniques. The library is designed to be flexible and easy to use on top of PyTorch and HuggingFace Transformers, allowing for quick experimentation.
...morevllm-ascend
vLLM-Ascend team
vLLM Ascend backend plugin
vllm-sr
vLLM-SR Team
vLLM Semantic Router - Intelligent routing for Mixture-of-Models
wxy-test
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs
vllm-cpu-nightly
vLLM Team
A high-throughput and memory-efficient inference and serving engine for LLMs