Autonomous AI agents that perform tasks independently
Ongoing research training transformer models at scale.
A native PyTorch Library for large model training.
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
...moreA Native-PyTorch Library for LLM Fine-tuning.
Generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains.
...moreEfficient Training for Big Models.
Mesh TensorFlow: Model Parallelism Made Easier.
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
A library for accelerating Transformer model training on NVIDIA GPUs.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
...moreA framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
...moreSGLang is a fast serving framework for large language models and vision language models.
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
...moreA more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
Blazingly fast LLM inference.
Run LLMs and batch jobs on any cloud. Get maximum cost savings, highest GPU availability, and managed execution -- all with a simple interface.
...moreMII makes low-latency and high-throughput inference, similar to vLLM powered by DeepSpeed.
Inference for text-embeddings in Rust, HFOIL Licence.
Inference for text-embeddings in Python
A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.
Easily deploy any LLM on a VM with minimal configuration, using Ansible.
Comprehensive set of tools for working with local LLMs for various tasks.
Use ChatGPT On Wechat via wechaty