[Feature] Multi-Lora support to allow asynchronous RL finetuning for multiple tasks.

## Checklist

- [ ] This feature will maintain backward compatibility with the current APIs in
  `areal/api/`. If not, please raise a refactor issue first.

## Motivation
Currently, AReaL mainly focuses on single-task RL finetuning. However, if there are multiple RL finetuning tasks to be run together, the memory, computation, and hardware requirements are significantly increased. This is because the only viable solution right now is to run each finetuning task separately. Nevertheless, in many scenarios, these tasks can be executed more efficiently through the use of Parameter-Efficient Fine-Tuning (PEFT) techniques, such as LoRA.

Therefore, there is a pressing need to support efficient PEFT finetuning techniques to enable a cloud-based system for finetuning multiple RL tasks in a multi-tenant, multi-user Finetuning as a Service (FaaS) setting.

In this RFC, we propose RL-based multi-LoRA finetuning that allows for the parallelized finetuning of multiple LoRA adapters for different RL tasks. This will enable resources to be shared more efficiently, thereby avoiding the inefficiencies associated with running them separately.


## Proposed change

From a high level, we propose to add the following functionality to support multi-Lora RL finetuning support
- Multi-lora rollout support in vLLM inference engine.
- Multi-lora training support in the FSDP finetuning engine.
- Optimizations and performance enhancements.

To this regard, we propose a multi-milestone RFC to enable full support for multi-LoRA RL finetuning in AReaL:

### Milestone 1: Basic LoRA Functionalities for Ascend-vLLM
- Support single-LoRA weight updates via load-from-disk
- Support single-LoRA weight updates via broadcast

### Milestone 2: Advanced Multi-LoRA Features
- Enable multi-LoRA weight updates via load-from-disk or broadcast
- Support training multiple LoRAs either sequentially or concurrently

### Milestone 3: Inference-side Optimizations
- Implement task-specific rollout interruption
- Improve rollout management and scheduling for load balancing and reduced rollout staleness

### Milestone 4: Training-side Optimizations
- Optimize resource utilization via bubble reduction and balanced task-level allocation
- Implement job scheduling for training
- Optimize LoRA and optimizer state loading/offloading

Each milestone will correspond to concrete PR deliverables, with details as follows:

| Milestone | Description |
|-----------|-------------|
| **M1 — Basic LoRA Functionalities** | Single-LoRA weights update via load-from-disk/broadcast |
| **M2 — Advanced Multi-LoRA** | Multi-LoRA updates & concurrent/sequential training support |
| **M3 — Inference Optimizations** | Task-specific rollout interruption + scheduling improvements |
| **M4 — Training Optimizations** | Bubble reduction, allocation, job scheduling, state offloading |

## Additional Information

See the following prior art and references:
- Motivation for LoRA PEFT: https://arxiv.org/abs/2106.09685
- Multi-LoRA kernel research: https://arxiv.org/abs/2505.14620



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Multi-Lora support to allow asynchronous RL finetuning for multiple tasks. #609

Checklist

Motivation

Proposed change

Milestone 1: Basic LoRA Functionalities for Ascend-vLLM

Milestone 2: Advanced Multi-LoRA Features

Milestone 3: Inference-side Optimizations

Milestone 4: Training-side Optimizations

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Milestone	Description
M1 — Basic LoRA Functionalities	Single-LoRA weights update via load-from-disk/broadcast
M2 — Advanced Multi-LoRA	Multi-LoRA updates & concurrent/sequential training support
M3 — Inference Optimizations	Task-specific rollout interruption + scheduling improvements
M4 — Training Optimizations	Bubble reduction, allocation, job scheduling, state offloading

[Feature] Multi-Lora support to allow asynchronous RL finetuning for multiple tasks. #609

Description

Checklist

Motivation

Proposed change

Milestone 1: Basic LoRA Functionalities for Ascend-vLLM

Milestone 2: Advanced Multi-LoRA Features

Milestone 3: Inference-side Optimizations

Milestone 4: Training-side Optimizations

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions