Search

Type:All MCP Servers Skills Agents

exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

AgentLLM Inference

2.9K1 dir

mistral.rs

Blazingly fast LLM inference.

AgentLLM Inference

6.7K1 dir

SkyPilot

Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.

...more

AgentLLM Inference

9.6K1 dir

DeepSpeed-Mii

MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.

AgentLLM Inference

2.1K1 dir

Text-Embeddings-Inference

Inference for text-embeddings in Rust, HFOIL Licence.

AgentLLM Inference

4.6K1 dir

Infinity

Inference for text-embeddings in Python

AgentLLM Inference

2.7K1 dir

Liger-Kernel

Efficient Triton Kernels for LLM Training.

AgentLLM Inference

6.2K1 dir

prima.cpp

A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.

AgentLLM Inference

1 dir

deploy-llms-with-ansible

Easily deploy any LLM on a VM with minimal configuration, using Ansible.

AgentLLM Inference

31 dir

Swiss Army Llama

Comprehensive set of tools for working with local LLMs for various tasks.

AgentLLM Applications

1K1 dir

wechat-chatgpt

Use ChatGPT On Wechat via wechaty

AgentLLM Applications

13K1 dir

Serge

a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

AgentLLM Applications

5.7K1 dir

IntelliServer

simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.

AgentLLM Applications

291 dir

Search with Lepton

Build your own conversational search engine using less than 500 lines of code by [LeptonAI](https://github.com/leptonai).

...more

AgentLLM Applications

8.1K1 dir

Robocorp

Create, deploy and operate Actions using Python anywhere to enhance your AI agents and assistants. Batteries included with an extensive set of libraries, helpers and logging.

...more

AgentLLM Applications

6251 dir

Tune Studio

Playground for devs to finetune & deploy LLMs

AgentLLM Applications

1 dir

talkd.ai dialog

Simple API for deploying any RAG or LLM that you want adding plugins.

AgentLLM Applications

4311 dir

Wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference

AgentLLM Applications

1K1 dir

GPUStack

An open-source GPU cluster manager for running LLMs

AgentLLM Applications

4.7K1 dir

MNN-LLM

- A Device-Inference framework, including LLM Inference on device(Mobile Phone/PC/IOT)

AgentLLM Applications

15K1 dir