Search
instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
...moreMeta Lingua
a lean, efficient, and easy-to-hack codebase to research LLMs.
Litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
nanotron
Minimalistic large language model 3D-parallelism training.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
...moreMegatron-LM
Ongoing research training transformer models at scale.
torchtitan
A native PyTorch Library for large model training.
Megatron-DeepSpeed
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
...moretorchtune
A Native-PyTorch Library for LLM Fine-tuning.
NeMo Framework
Generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains.
...moreBMTrain
Efficient Training for Big Models.
Mesh Tensorflow
Mesh TensorFlow: Model Parallelism Made Easier.
GPT-NeoX
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
Transformer Engine
A library for accelerating Transformer model training on NVIDIA GPUs.
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
...moreunslothai
A framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
...moreSGLang
SGLang is a fast serving framework for large language models and vision language models.
TGI
a toolkit for deploying and serving Large Language Models (LLMs).
FasterTransformer
NVIDIA Framework for LLM Inference(Transitioned to TensorRT-LLM)
MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
...more