APRIL (Active Partial Rollouts) is a compute-efficient method to accelerate rollout generation in reinforcement learning training for Large Language Models (LLMs). By addressing the critical "long-tail" problem in RL training where a few samples with exceptionally long responses cause the entire batch to stall, APRIL delivers:
- 20-35% improvement in rollout throughput
- 2-5% higher final model accuracy
- Faster convergence during training
- Hardware agnostic - supports both NVIDIA and AMD GPUs
In on-policy RL training (RLHF/GRPO/DAPO), the rollout phase dominates runtime, typically accounting for over 90% of total training time. Due to the highly variable response lengths across samples, synchronous training paradigms suffer from severe GPU underutilization as faster-generating workers sit idle waiting for the longest-running instances to complete.
APRIL revolutionizes rollout efficiency through an innovative mechanism:
- Over-provisioning: Deliberately initiate more rollout requests than needed (N' > N)
- Active interruption: Once the target batch size is reached, actively stop remaining unfinished rollouts
- Intelligent recycling: Store partial results in a buffer and resume generation in the next iteration
- Seamless integration: Works with existing RL frameworks without modifying inference kernels
- π₯ Plug-and-play: Enable with just two command-line flags (
--partial-rolloutand--over-sampling-batch-size) - π― Algorithm-agnostic: Compatible with GRPO, DAPO, GSPO, and other popular RL algorithms
- ποΈ Framework-ready: Already integrated into slime framework
- β‘ System-level optimization: Operates at the scheduling layer, complementary to kernel-level optimizations
- π§ Production-tested: Evaluated on multiple LLMs including DeepSeek-R1, Qwen3, and GLM-4
- AMD Docker Image
$DOCKER_IMG=rlsys/april:AMD_exp_docker_image
More detail, please refer to AMD Dockerfile.
- NV Docker Image
$DOCKER_IMG=rlsys/april:NV_exp_docker_image
More detail, please refer to NVIDIA Dockerfile.
docker run --rm \
--gpus all \
--ipc=host \
--shm-size=16g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-it $DOCKER_IMG \
/bin/bashgit clone https://github.com/RLsys-Foundation/APRIL.git
cd APRIL
pip install -e .Run a training example with APRIL enabled:
# Example: Qwen3-4B with DAPO
bash scripts/partial_rollout/qwen/grpo/run-qwen3-4B-dapo-partial.sh# Enable APRIL optimization
--partial-rollout
# Set over-sampling batch size (should be > rollout_batch_size)
--over-sampling-batch-size 64 # e.g., 2x the rollout_batch_size
# Standard rollout batch size
--rollout-batch-size 32For detailed parameter explanations, see arguments.py.
| Dataset | Model | Algorithm | Throughput Gain | Accuracy Improvement |
|---|---|---|---|---|
| DAPO-Math-17k | Qwen3-4B | DAPO | +17% | +2.3% |
| DeepScaleR | Qwen3-4B | GRPO | +21% | +3.1% |
| DeepMath-103K | Qwen3-4B | GSPO | +35% | +4.7% |
| Agent Tasks | DeepSeek-1.5B | GRPO | +23% | +2.8% |
APRIL not only improves training efficiency but also achieves:
- Faster convergence: Reaches target accuracy 15-20% faster
- Higher final accuracy: 2-5% improvement in final model performance
- Stable training: No additional instability despite partial off-policy samples
| Component | Path | Description |
|---|---|---|
| Rollout Engine | slime/rollout/sglang_example.py |
Manages generation with active interruption |
| Buffer System | slime/ray/buffer.py |
Stores and prioritizes partial rollouts |
| Scheduler | slime/ray/rollout.py |
Orchestrates over-sampling and batch management |
| Training Backend | slime/backends/ |
Supports both Megatron and FSDP |
While APRIL introduces ~40% off-policy tokens per iteration, extensive experiments show:
- No significant training instability
- Improved final model accuracy
- Consistent convergence patterns
Note: For extremely long sequences (e.g., multi-turn agent tasks), additional validation may be needed.
Yes! APRIL operates at the system scheduling layer and is fully compatible with:
- Kernel optimizations (FlashAttention, continuous batching)
- Inference engines (vLLM, SGLang, TensorRT-LLM)
- Speculative decoding techniques
- Model parallelism strategies
APRIL is hardware-agnostic and tested on:
- NVIDIA GPUs: H100
- AMD GPUs: MI300X/MI325
APRIL/
βββ imgs/ # Documentation images
β βββ APRIL.png # Project logo
β βββ partial_scheduling.png # Architecture diagrams
βββ scripts/
β βββ partial_rollout/ # Training scripts
β βββ deepseek/ # DeepSeek model experiments
β βββ qwen/ # Qwen model experiments
β βββ README.md # Script documentation
βββ slime/ # Core framework
β βββ backends/ # Training backends
β β βββ fsdp_utils/ # FSDP implementation
β β βββ megatron_utils/ # Megatron-LM support
β βββ rollout/
β β βββ sglang_example.py # Core rollout implementation
β β βββ rm_hub/ # Reward model integrations
β βββ ray/ # Distributed orchestration
β β βββ buffer.py # Partial rollout buffer
β β βββ rollout.py # Rollout scheduling
β βββ utils/ # Utilities and helpers
βββ docs/ # Documentation
β βββ en/ # English docs
β βββ zh/ # Chinese docs
βββ tools/ # Model conversion utilities
- Over-provisioning Phase: Request N' = Ξ±N rollouts (Ξ± typically 1.5-2.0)
- Active Monitoring: Track completion status across all workers
- Intelligent Interruption: Send abort signal when N samples complete
- Buffer Management: Store partial results with generation state
- Seamless Resumption: Continue partial rollouts in next iteration
APRIL is designed as a drop-in enhancement for existing RL training pipelines:
- Minimal code changes: Enable with command-line flags
- Framework agnostic: Works with OpenRLHF, verl, Areal, slime
- Automatic optimization: Self-tuning based on workload characteristics
If you use APRIL in your research, please cite our paper:
@misc{zhou2025aprilactivepartialrollouts,
title={APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation},
author={Yuzhen Zhou and Jiajun Li and Yusheng Su and Gowtham Ramesh and Zilin Zhu and Xiang Long and Chenyang Zhao and Jin Pan and Xiaodong Yu and Ze Wang and Kangrui Du and Jialian Wu and Ximeng Sun and Jiang Liu and Qiaolin Yu and Hao Chen and Zicheng Liu and Emad Barsoum},
year={2025},
eprint={2509.18521},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2509.18521},
}We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
APRIL builds upon the excellent work of:
- slime - The base RL training framework
- SGLang - High-performance inference backend
- Megatron-LM - Distributed training backend
For questions and support:
- Open an issue on GitHub




