LLM Training Frameworks
20AI tools in the LLM Training Frameworks category
veRL
veRL is a flexible and efficient RL framework for LLMs.
ROLL
alibaba
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
trl
Leandro von Werra <leandro.vonwerra@gmail.com>
Train transformer language models with reinforcement learning.
unslothai
A framework that specializes in efficient fine-tuning. On its GitHub page, you can find ready-to-use fine-tuning templates for various LLMs, allowing you to easily train your own data for free on the Google Colab cloud.
Transformer Engine
A library for accelerating Transformer model training on NVIDIA GPUs.
Megatron-DeepSpeed
DeepSpeed version of NVIDIA's Megatron-LM that adds additional support for several features such as MoE model training, Curriculum Learning, 3D Parallelism, and others.
nanotron
Minimalistic large language model 3D-parallelism training.
torchtune
A Native-PyTorch Library for LLM Fine-tuning.
Axolotl
Open-source framework for fine-tuning and evaluating LLMs. It simplifies the process of experimenting with different training configurations and makes it easy to reproduce and share results, supporting features like LoRA, QLoRA, DeepSpeed, PEFT, and multi-GPU setups.
NeMo Framework
Generative AI framework built for researchers and PyTorch developers working on Large Language Models (LLMs), Multimodal Models (MMs), Automatic Speech Recognition (ASR), Text to Speech (TTS), and Computer Vision (CV) domains.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
BMTrain
Efficient Training for Big Models.
GPT-NeoX
An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
torchtitan
A native PyTorch Library for large model training.
OpenRLHF
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
Litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Mesh Tensorflow
Mesh TensorFlow: Model Parallelism Made Easier.
Meta Lingua
a lean, efficient, and easy-to-hack codebase to research LLMs.
Megatron-LM
Ongoing research training transformer models at scale.
maxtext
A simple, performant and scalable Jax LLM!