🚀 The feature, motivation and pitch
Follow-up to our previous communication under the roadmap, I’m opening this issue to propose implementing LoRA support for the vLLM alignment workflow. The multimodal RL projects, e.g., mm_grpo are also looking to adopt vllm-omni as a rollout engine. Since MM RL typically fine-tunes only the LoRA adapters (i.e., FlowGRPO, DiffusionNFT, this integration would be directly beneficial in the sense that it enables RL training workflows where weights update dynamically.
Having LoRA in vllm-omni also aligns with base vllm design, and it brings additional benefits
- Dynamic adaptation: load/unload adapters without restart
- Memory efficiency: smaller memory footprint than full model copies
Happy to take on the work if no one else is currently assigned.
Alternatives
No response
Additional context
No response
Before submitting a new issue...