LLM Inference

AI tools in the LLM Inference category

All (16)MCP Servers (0)Skills (1)Agents (15)

SGLang

SGLang is a fast serving framework for large language models and vision language models.

AgentLLM Inference

1 dir

TGI

a toolkit for deploying and serving Large Language Models (LLMs).

AgentLLM Inference

1 dir

TensorRT-LLM

Nvidia Framework for LLM Inference

AgentLLM Inference

1 dir

FasterTransformer

NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)

AgentLLM Inference

1 dir

MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

AgentLLM Inference

1 dir

exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

AgentLLM Inference

1 dir

mistral.rs

Blazingly fast LLM inference.

AgentLLM Inference

1 dir

SkyPilot

Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.

AgentLLM Inference

1 dir

DeepSpeed-Mii

MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.

AgentLLM Inference

1 dir

Text-Embeddings-Inference

Inference for text-embeddings in Rust, HFOIL Licence.

AgentLLM Inference

1 dir

Infinity

Inference for text-embeddings in Python

AgentLLM Inference

1 dir

LMDeploy

A high-throughput and low-latency inference and serving framework for LLMs and VLs

AgentLLM Inference

1 dir

Liger-Kernel

Efficient Triton Kernels for LLM Training.

AgentLLM Inference

1 dir

prima.cpp

A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.

AgentLLM Inference

1 dir

deploy-llms-with-ansible

Easily deploy any LLM on a VM with minimal configuration, using Ansible.

AgentLLM Inference

1 dir