Skip to content

DDVD233/mirl

 
 

Repository files navigation

MIRL: Multisensory Intelligence Reinforcement Learning for LLMs

MIRL is a flexible reinforcement learning framework for training large language models with multimodal capabilities. This framework is built upon verl, extending its capabilities to support diverse modalities and annotation formats.

Key Features

🎯 Enhanced Annotation Support

  • Support for multiple annotation formats beyond standard text
  • Native support for Geometry3k format for mathematical reasoning tasks
  • Flexible annotation pipeline for custom formats

🌐 Multimodal Training

  • Audio support (implemented): Train models with audio understanding and generation capabilities
  • Extensible architecture: Framework designed to accommodate arbitrary modalities
  • Active development for additional modality support

🚀 Future Roadmap

  • Diffusion Language Models: Planned support for training diffusion-based language models
  • Unified training pipeline for both autoregressive and diffusion architectures

Getting Started

Prerequisites

  • CUDA-compatible GPU (recommended: A100, H100, or similar)
  • CUDA 12.1 or higher
  • Python 3.10 - 3.12

Installation

To get started with MIRL, first clone the repository and navigate to the project directory:

git clone https://github.com/DDVD233/mirl

Then, follow these steps to set up the environment and install the necessary dependencies:

  1. Create a new conda environment

    conda create -n mirl python=3.11
    conda activate mirl
  2. Install uv and vLLM

    pip install uv
    uv pip install vllm --torch-backend=auto
  3. Install Flash Attention

    git clone https://github.com/Dao-AILab/flash-attention
    cd flash-attention
    MAX_JOBS=16 python setup.py install
  4. Install requirements

    pip install -r requirements.txt
  5. (Optional) Configure WandB for experiment tracking

    wandb login

    And follow the prompts to set up your WandB account.

Acknowledgments

MIRL is a fork of verl (Volcano Engine Reinforcement Learning), which provides the foundational HybridFlow framework and efficient RLHF training infrastructure.

License

This project inherits the Apache 2.0 License from the original verl framework.

About

verl: Volcano Engine Reinforcement Learning for LLMs

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.8%
  • Shell 6.6%
  • Other 0.6%