Skip to content

[Feature] Multi-Lora support to allow asynchronous RL finetuning for multiple tasks. #609

@HwVanICI

Description

@HwVanICI

Checklist

  • This feature will maintain backward compatibility with the current APIs in
    areal/api/. If not, please raise a refactor issue first.

Motivation

Currently, AReaL mainly focuses on single-task RL finetuning. However, if there are multiple RL finetuning tasks to be run together, the memory, computation, and hardware requirements are significantly increased. This is because the only viable solution right now is to run each finetuning task separately. Nevertheless, in many scenarios, these tasks can be executed more efficiently through the use of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA.

Therefore, there is a pressing need to support efficient PEFT finetuning techniques to enable a cloud-based system for finetuning multiple RL tasks in a multi-tenant, multi-user Finetuning as a Service (FaaS) setting.

In this RFC, we propose RL-based multi-LoRA finetuning that allows for the parallelized finetuning of multiple LoRA adapters for different RL tasks. This will enable resources to be shared more efficiently, thereby avoiding the inefficiencies associated with running them separately.

Proposed change

From a high level, we propose to add the following functionality to support multi-Lora RL finetuning support

  • Multi-lora rollout support in vLLM inference engine.
  • Multi-lora training support in the FSDP finetuning engine.
  • Optimizations and performance enhancements.

To this regard, we propose a multi-milestone RFC to enable full support for multi-LoRA RL finetuning in AReaL:

Milestone 1: Basic LoRA Functionalities for Ascend-vLLM

  • Support single-LoRA weight updates via load-from-disk
  • Support single-LoRA weight updates via broadcast

Milestone 2: Advanced Multi-LoRA Features

  • Enable multi-LoRA weight updates via load-from-disk or broadcast
  • Support training multiple LoRAs either sequentially or concurrently

Milestone 3: Inference-side Optimizations

  • Implement task-specific rollout interruption
  • Improve rollout management and scheduling for load balancing and reduced rollout staleness

Milestone 4: Training-side Optimizations

  • Optimize resource utilization via bubble reduction and balanced task-level allocation
  • Implement job scheduling for training
  • Optimize LoRA and optimizer state loading/offloading

Each milestone will correspond to concrete PR deliverables, with details as follows:

Milestone Description
M1 — Basic LoRA Functionalities Single-LoRA weights update via load-from-disk/broadcast
M2 — Advanced Multi-LoRA Multi-LoRA updates & concurrent/sequential training support
M3 — Inference Optimizations Task-specific rollout interruption + scheduling improvements
M4 — Training Optimizations Bubble reduction, allocation, job scheduling, state offloading

Additional Information

See the following prior art and references:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions