Skip to content

🚀 [2025/09/25] Recent Updates Summary for ROLL Project #173

@PanAndy

Description

@PanAndy

llo everyone!
Thank you for your interest in ROLL.
We continue to iterate and improve the ROLL project. Below is a summary of recent updates, categorized for your reference.

Highlights:

New features in PR #172

  • Agent / GEM / Tool Use

    • Aligned with the GEM environment definition and adjusted the env manager (gEm) to better support environment and tool interactions, improving customization flexibility.
    • Added Agentic Tool Use training examples and integrated ToolUse documentation.
    • Added step-wise reinforcement support (step reinforce) to enable new stepwise training capabilities.
    • Redundant environment capability: supports two-dimensional redundancy via num_env_groups and group_size to increase tolerance to environment failures.
  • Models & backends

    • Added Qwen3-Next model implementation and training support (including fixes for checkpoint saving).
    • Support vLLM 0.10.2 and added dynamic FP8 rollout for vLLM.
    • Added support for multiple sglang versions (including sglang 0.5.2, 0.4.10.post2).
    • Provided a Dockerfile example for Torch 2.8 and updated mcore to 0.13.
  • Pipelines & training algorithms

    • Added SFT pipeline support.
    • Added Wan2_2 reward FL pipeline support.
    • Support use_remove_padding in Megatron strategy for tail trimming optimization to improve performance.

Other features, bug fixes, and refinements

  • Improved remove_padding support to reduce padding overhead.
  • Added a roll debug flag to improve metric recording.
  • Default strategy for llm judge reward worker switched from HF to vLLM to improve efficiency.
  • Adjusted entropy loss computation to avoid unnecessary calculations.
  • Changed default loss aggregation mode to seq-mean-token-mean (loss_agg_mode).
  • Support passing is_lora when broadcasting parameters.
  • Added include_stop_str_in_output, stop_strings and other stop-handling configuration options.
  • Exposed environment metrics with aggregate_metrics control.
  • Restructured agentic directories: merged roll/agentic into roll/pipeline/agentic to avoid split logic.
  • Fixed webshop env state handling bug.
  • Fixed vLLM version comparison logic, isolated cache roots for multiple vLLM actor_workers, and resolved vLLM compile conflicts.
  • Fixed dataset load lock errors, math_env exceptions, and various other stability issues.
  • Fixed ROLL hang in colocate mode on XPU.
  • Fixed potential loss of environment variables when forwarding vLLM env vars to RayWorkerWrapper.
  • Fixed convert script and qwen3next checkpoint saving.
  • Fixed potential gradient loss caused by mask_mean/mask_sum handling dim=None.
  • Deprecated: torch251 / vllm0.7.3 / sglang0.4.3 have been removed from the repository.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions