-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Checklist
- This feature will maintain backward compatibility with the current APIs in
areal/api/. If not, please raise a refactor issue first.
Motivation
Currently, AReaL mainly focuses on single-task RL finetuning. However, if there are multiple RL finetuning tasks to be run together, the memory, computation, and hardware requirements are significantly increased. This is because the only viable solution right now is to run each finetuning task separately. Nevertheless, in many scenarios, these tasks can be executed more efficiently through the use of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA.
Therefore, there is a pressing need to support efficient PEFT finetuning techniques to enable a cloud-based system for finetuning multiple RL tasks in a multi-tenant, multi-user Finetuning as a Service (FaaS) setting.
In this RFC, we propose RL-based multi-LoRA finetuning that allows for the parallelized finetuning of multiple LoRA adapters for different RL tasks. This will enable resources to be shared more efficiently, thereby avoiding the inefficiencies associated with running them separately.
Proposed change
From a high level, we propose to add the following functionality to support multi-Lora RL finetuning support
- Multi-lora rollout support in vLLM inference engine.
- Multi-lora training support in the FSDP finetuning engine.
- Optimizations and performance enhancements.
To this regard, we propose a multi-milestone RFC to enable full support for multi-LoRA RL finetuning in AReaL:
Milestone 1: Basic LoRA Functionalities for Ascend-vLLM
- Support single-LoRA weight updates via load-from-disk
- Support single-LoRA weight updates via broadcast
Milestone 2: Advanced Multi-LoRA Features
- Enable multi-LoRA weight updates via load-from-disk or broadcast
- Support training multiple LoRAs either sequentially or concurrently
Milestone 3: Inference-side Optimizations
- Implement task-specific rollout interruption
- Improve rollout management and scheduling for load balancing and reduced rollout staleness
Milestone 4: Training-side Optimizations
- Optimize resource utilization via bubble reduction and balanced task-level allocation
- Implement job scheduling for training
- Optimize LoRA and optimizer state loading/offloading
Each milestone will correspond to concrete PR deliverables, with details as follows:
| Milestone | Description |
|---|---|
| M1 — Basic LoRA Functionalities | Single-LoRA weights update via load-from-disk/broadcast |
| M2 — Advanced Multi-LoRA | Multi-LoRA updates & concurrent/sequential training support |
| M3 — Inference Optimizations | Task-specific rollout interruption + scheduling improvements |
| M4 — Training Optimizations | Bubble reduction, allocation, job scheduling, state offloading |
Additional Information
See the following prior art and references:
- Motivation for LoRA PEFT: https://arxiv.org/abs/2106.09685
- Multi-LoRA kernel research: https://arxiv.org/abs/2505.14620