MIRL is a flexible reinforcement learning framework for training large language models with multimodal capabilities. This framework is built upon verl, extending its capabilities to support diverse modalities and annotation formats.
- Support for multiple annotation formats beyond standard text
- Native support for Geometry3k format for mathematical reasoning tasks
- Flexible annotation pipeline for custom formats
- Audio support (implemented): Train models with audio understanding and generation capabilities
- Extensible architecture: Framework designed to accommodate arbitrary modalities
- Active development for additional modality support
- Diffusion Language Models: Planned support for training diffusion-based language models
- Unified training pipeline for both autoregressive and diffusion architectures
- CUDA-compatible GPU (recommended: A100, H100, or similar)
- CUDA 12.1 or higher
- Python 3.10 - 3.12
To get started with MIRL, first clone the repository and navigate to the project directory:
git clone https://github.com/DDVD233/mirlThen, follow these steps to set up the environment and install the necessary dependencies:
-
Create a new conda environment
conda create -n mirl python=3.11 conda activate mirl
-
Install uv and vLLM
pip install uv uv pip install vllm --torch-backend=auto
-
Install Flash Attention
git clone https://github.com/Dao-AILab/flash-attention cd flash-attention MAX_JOBS=16 python setup.py install -
Install requirements
pip install -r requirements.txt
-
(Optional) Configure WandB for experiment tracking
wandb login
And follow the prompts to set up your WandB account.
MIRL is a fork of verl (Volcano Engine Reinforcement Learning), which provides the foundational HybridFlow framework and efficient RLHF training infrastructure.
This project inherits the Apache 2.0 License from the original verl framework.