-
Notifications
You must be signed in to change notification settings - Fork 173
Open
Description
llo everyone!
Thank you for your interest in ROLL.
We continue to iterate and improve the ROLL project. Below is a summary of recent updates, categorized for your reference.
Highlights:
- ROLL aligned with the GEM environment definition and adds out-of-the-box, extensible Agentic Tool Use training.
- Support for Qwen3-Next model training.
- vLLM dynamic FP8 rollout support and remove_padding to improve training efficiency.
- Support for SFT (Supervised Fine-Tuning) pipeline.
- Added support for the Wan2_2 Reward FL pipeline and RL training for raw-image diffusion models.
New features in PR #172
-
Agent / GEM / Tool Use
- Aligned with the GEM environment definition and adjusted the env manager (gEm) to better support environment and tool interactions, improving customization flexibility.
- Added Agentic Tool Use training examples and integrated ToolUse documentation.
- Added step-wise reinforcement support (step reinforce) to enable new stepwise training capabilities.
- Redundant environment capability: supports two-dimensional redundancy via num_env_groups and group_size to increase tolerance to environment failures.
-
Models & backends
- Added Qwen3-Next model implementation and training support (including fixes for checkpoint saving).
- Support vLLM 0.10.2 and added dynamic FP8 rollout for vLLM.
- Added support for multiple sglang versions (including sglang 0.5.2, 0.4.10.post2).
- Provided a Dockerfile example for Torch 2.8 and updated mcore to 0.13.
-
Pipelines & training algorithms
- Added SFT pipeline support.
- Added Wan2_2 reward FL pipeline support.
- Support use_remove_padding in Megatron strategy for tail trimming optimization to improve performance.
Other features, bug fixes, and refinements
- Improved remove_padding support to reduce padding overhead.
- Added a roll debug flag to improve metric recording.
- Default strategy for llm judge reward worker switched from HF to vLLM to improve efficiency.
- Adjusted entropy loss computation to avoid unnecessary calculations.
- Changed default loss aggregation mode to seq-mean-token-mean (loss_agg_mode).
- Support passing is_lora when broadcasting parameters.
- Added include_stop_str_in_output, stop_strings and other stop-handling configuration options.
- Exposed environment metrics with aggregate_metrics control.
- Restructured agentic directories: merged roll/agentic into roll/pipeline/agentic to avoid split logic.
- Fixed webshop env state handling bug.
- Fixed vLLM version comparison logic, isolated cache roots for multiple vLLM actor_workers, and resolved vLLM compile conflicts.
- Fixed dataset load lock errors, math_env exceptions, and various other stability issues.
- Fixed ROLL hang in colocate mode on XPU.
- Fixed potential loss of environment variables when forwarding vLLM env vars to RayWorkerWrapper.
- Fixed convert script and qwen3next checkpoint saving.
- Fixed potential gradient loss caused by mask_mean/mask_sum handling dim=None.
- Deprecated: torch251 / vllm0.7.3 / sglang0.4.3 have been removed from the repository.
taoluo and sunyuhan19981208
Metadata
Metadata
Assignees
Labels
No labels