Vision Reinforcement Learning + Memory Efficient RL #3326

shimmyshimmer · 2025-09-16T16:13:39Z

shimmyshimmer
Sep 16, 2025
Maintainer

We're excited to support Vision models for RL and even more memory efficient + faster RL!

Unsloth now supports vision/multimodal RL with Gemma 3 and Qwen2.5-VL. Due to Unsloth's unique weight sharing and custom kernels, Unsloth makes VLM RL 1.5–2× faster, uses 90% less VRAM, and enables 10× longer context lengths than FA2 setups, with no accuracy loss. Qwen2.5-VL GRPO notebook

Full details in our blogpost: https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl

This update also introduces Qwen's GSPO algorithm.
Our new vision RL support also comes now even faster & more memory efficient! Our new kernels & algos allows faster RL for text and vision LLMs with 50% less VRAM & 10× more context.
Introducing a new RL feature called 'Standby'. Before, RL requires GPU splitting between training & inference. With Unsloth Standby, you no longer have to & 'Unsloth Standby' uniquely limits speed degradation compared to other implementations and sometimes makes training even faster! Read our Blog

We released Aider Polyglot benchmarks for our DeepSeek-V3.1 Dynamic GGUFs and Unsloth quants perform consistently better than others. Blog

Don't forget to also join our Reddit: r/unsloth 🥰

This discussion was created from the release Vision Reinforcement Learning + Memory Efficient RL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Vision Reinforcement Learning + Memory Efficient RL #3326

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Vision Reinforcement Learning + Memory Efficient RL #3326

Uh oh!

shimmyshimmer Sep 16, 2025 Maintainer

Replies: 0 comments

shimmyshimmer
Sep 16, 2025
Maintainer