Agents
2,229Autonomous AI agents that perform tasks independently
A library for accelerating Transformer model training on NVIDIA GPUs.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
A framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
Open-source framework for fine-tuning and evaluating LLMs. It simplifies the process of experimenting with different training configurations and makes it easy to reproduce and share results, supporting features like LoRA, QLoRA, DeepSpeed, PEFT, and multi-GPU setups.
SGLang is a fast serving framework for large language models and vision language models.
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
MII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
Inference for text-embeddings in Rust, HFOIL Licence.
A high-throughput and low-latency inference and serving framework for LLMs and VLs
A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.
Easily deploy any LLM on a VM with minimal configuration, using Ansible.
Comprehensive set of tools for working with local LLMs for various tasks.
a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!
simplifies the evaluation of LLMs by providing a unified microservice to access and test multiple AI models.
Build your own conversational search engine using less than 500 lines of code by [LeptonAI](https://github.com/leptonai).