Highlights
Model Engine
As noted in #3624, model engine is a service that provides APIs for manipulation of a parallel and distributed model using single controller. This release provides a prototype for such idea using FSDP + ulysses backend and megatron core backend. The implementation is under https://github.com/volcengine/verl/tree/main/verl/workers/engine. Currently, we only implement SFT trainer using model engine. In the following releases, we will start to implement RL trainer using model engine.
Please refer to https://verl.readthedocs.io/en/latest/workers/model_engine.html for the design and instructions to add more model engine backends.
Rollout Server
As agentic reinforcement learning emerges as a predominant research area, verl rollout is transitioning from SPMD mode to server mode, which is more efficient for multi-turn rollout and tool calling. In version 0.6, we made several major changes to rollout servers:
- SGLang: #3090 completely separates the SGLang process from the trainer process in SPMD mode and introduces a server adapter to synchronize weights between the trainer and SGLang server. Furthermore, #3456 migrates SGLang to native server mode, enabling full-fledged features and optimizations for online serving.
- vLLM: While the vLLM model_runner remains within the trainer process, #3456 also transitions vLLM to native server mode. We may explore completely separating the vLLM process from the trainer process in future releases.
By switching to native server mode, #3530 adds DP+EP support for large MoE models.
To improve extensibility, #3285 refactors the BaseRollout interface and deprecates all sharding managers. This refactor ensures the training engine remains agnostic of the inference engine during weight synchronization, making it easier to integrate new inference engines (e.g., TensorRT-LLM) without modifying the training engine.
Newly Supported Models
- Qwen3 VL
- GPT OSS
Algorithm
- GSPO
- Token-level TIS: #2953 introduces token-level importance sampling to mitigate the gap between rollout and training.
- Sequence-level TIS: #3694 add more comprehensive metrics to monitor distribution mismatch between rollout and training, and introduces sequence-level importance sampling.
Recipe
Some awesome recipes have been added in v0.6:
Breaking changes and deprecations
nD Dispatch method
Previously, we implemented a set of predefined dispatch method including ONE_TO_ALL, DP_COMPUTE_DATA_PROTO, MEGATRON_COMPUTE_DATA_PROTO, etc,. DP_COMPUTE_DATA_PROTO and MEGATRON_COMPUTE_DATA_PROTO are strongly correlated to the underlying distributed strategies. Writing a separate dispatch method for each strategy is not scalable. In this release, we propose a new API to to unify all distributed strategies. The general steps are
- Define device meshes or process groups
- register dispatch and collect info by calling
_register_dispatch_collect_infoinside the worker - Add registration for methods using
@register(dispatch_mode=make_nd_compute_dataproto_dispatch_fn(mesh_name=mesh_name))
Please refer to https://github.com/volcengine/verl/blob/main/tests/single_controller/test_device_mesh_register.py as an example.
ShardingManager
ShardingManager is deprecated and will be removed in next release.
Importance bug fixes
- Fix hang issue when mixing text and images data training in VLMs (e.g., Qwen VL)
- Fix DataProto getstate bug
What's Changed
- [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code by @eric-haibin-lin in #2621
- [ci] test: add CriticWorker unit test, make some util CPU friendly by @eric-haibin-lin in #2717
- [ray] feat: RayWorkerGroup support set worker env by @NKcqx in #2685
- [sglang] fix: Adding strict naming sanity for sglang by @zhaochenyang20 in #2719
- [misc] chore: bump main branch version to v0.5.0.dev by @eric-haibin-lin in #2718
- [megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification by @HelloWorld686 in #2484
- [tool] fix: geo3k create return by @nanjiangwill in #2714
- [doc] feat: Add agent-lightning in the list of "awesome works using verl by @wizardlancet in #2726
- [ci] fix: checkpoint_convertor ci miss a hf model download by @ETOgaosion in #2730
- [recipe] chore: add retool training script by @wuxibin89 in #2732
- [ci] fix: release ascend test time, fix one step off-policy CI by @ETOgaosion in #2731
- [doc] feat: add resizable sidebar and improve layout by @Tingberer in #2577
- [docker] feat: upgrade to torch 2.7, sglang 0.4.8 by @ETOgaosion in #2617
- [megatron] feat: a bunch of optimzation on vram, sequence packing by @ISEEKYAN in #2678
- [CI] feat: add
mypyto pre-commit by @frrad in #2614 - [doc] style: change resize handle from gradient to plain color by @Tingberer in #2746
- refactor: Make sure to keep the type checking by @YeonwooSung in #2634
- [rollout] feat: remove chat scheduler by @wuxibin89 in #2725
- [perf] feat: add optional role selection in discrete mode for NPU Profiler by @tongtong0613 in #2750
- [doc] feat: add retool blog by @eric-haibin-lin in #2761
- [algo] refactor: don't special-case
compute_policy_lossby @frrad in #2701 - [BREAKING] [rollout] chore: remove default rollout selection by @vermouth1992 in #2757
- [misc] fix: Handle N-D arrays and complex objects in union_numpy_dict by @MikeDean2367 in #2768
- [recipe] fix: fix retool SFT dataset by @vermouth1992 in #2764
- [doc] fix: fix typo in agentic RL documentation by @kibitzing in #2777
- [cfg] fix: fix failing rollout config test on main by @eric-haibin-lin in #2771
- [docker] feat: upgrade vllm to 0.9.1 by @ETOgaosion in #2747
- [recipe] fix: fix issue when running split ppo by @as12138 in #2745
- [recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process by @none0663 in #2739
- [recipe] feat: add QWen2.5-7b-instruct retool by @vermouth1992 in #2800
- [recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization by @MzeroMiko in #2795
- [tool] fix: Typo fix -- Rename
to_openai_function_tool_schematoget_openai_tool_schemaby @wizeng23 in #2806 - [perf] feat: Padding before batch post-process in agent-loop to save time by @PopSoda2002 in #2773
- [vllm,rollout] fix: vllm rollout lock file permission by @clearhanhui in #2805
- [training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn by @kibitzing in #2741
- [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @leo-pony in #2782
- Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" by @ETOgaosion in #2813
- [algo] feat: add GSPO-token policy loss computation function by @0x404 in #2775
- [sglang] fix: support the configuration of attention_backend in sglang by @tardis-key in #2818
- [rollout] feat: pass all dataset fields to agent loop run by @wuxibin89 in #2810
- [docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 by @ETOgaosion in #2794
- [sglang] fix: fix missing engine_kwargs by @vermouth1992 in #2823
- [perf, doc] feat: Add profiling continous steps in one database by @davidmlw in #2695
- [ci] fix: vllm no dataset by @ETOgaosion in #2831
- [tool] fix: load MCP tools in async rollout mode by @mathewjhan in #2821
- [rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset by @vllbc in #2740
- [CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 by @Crispig in #2838
- [training_utils] feat: Support
assert_casefor sandbox fusion by @HollowMan6 in #2374 - [recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU by @zhihe-wang in #2836
- [doc] feat: add verl multinode SkyPilot example by @panf2333 in #2849
- [megatron] feat: Add MindSpeed support on the NPU device by @CurryRice233 in #2707
- [misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency by @chi2liu in #2827
- [fsdp] feat: optimize fsdp2 by @vermouth1992 in #2843
- [recipe] feat: modify dapo deepseek megatron script by @vermouth1992 in #2711
- [megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts by @HollowMan6 in #2864
- [fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 by @hebiao064 in #2720
- [ci] fix: fix fsdp test in transformers 4.54.1 by @vermouth1992 in #2874
- [trainer, hardware] chore: add pin_memory_device when pin_memory is enabled by @zheliuyu in #2871
- [data] feat: dump train/test example as JSON by @wantbook-book in #2666
- [misc] refactor: Add
AbstractRewardManagerabstract class by @frrad in #2763 - [doc] fix: Fix the role assignment error in the interaction demo file and doc. by @Qiao0124 in #2476
- [trainer, ci] fix: fix error variable in new engine impl and add ci test by @ShareLer in #2647
- [misc] feat: add nccl timeout configuration to fsdp workers by @shinytang6 in #2321
- [trainer] fix: move UID generation before batch processing for future conditioning support by @nanjiangwill in #2880
- [sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues by @hebiao064 in #2869
- [doc] fix: multi turn argument is not available by @techkang in #2883
- [tool, sglang] feat: add tool create info by @nanjiangwill in #2870
- [trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer by @looput in #2353
- [ci] fix: retry type check on cpu by @ETOgaosion in #2887
- [fsdp, trainer] fix: save config parameters to wandb in SFT by @EasonZhong668 in #2884
- [misc] feat: support logging rollout prob vs. actor probs in multi-turn for debugging purpose, follow up of #1712 by @TomQunChao in #2808
- [FSDP] feat: Allows specifying a different reference model by @ethen8181 in #2050
- [rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences by @wlf-darkmatter in #2602
- [ray] feat: support directly register dispatch device mesh by @vermouth1992 in #2893
- [doc] fix: Specify rollout engine in quickstart.rst by @TonyLianLong in #2905
- [BREAKING] [ray, megatron] feat: remove RayMegatronWorker by @vermouth1992 in #2895
- [megatron] refactor: simplify module init in megatron_workers, extract common operations by @ETOgaosion in #2400
- [rollout, sglang] fix: fix encoding logic bug by @nanjiangwill in #2901
- [megatron] fix: qwen2vl megatron fused forward param bug by @Yangruipis in #2595
- [sglang] fix: remove unnecessary maybe_set_triton_cache_manager by @hebiao064 in #2926
- [misc] refactor: deprecate sharding manager (part 1) by @vermouth1992 in #2912
- [megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 by @yzlnew in #2749
- [fsdp] fix: call reshard() to resolve no shard attribute by @weifengpy in #2941
- [megatron] chore: update example 671B script, no offline dist-ckpt needed any more by @ISEEKYAN in #2945
- [tool] feat: handle cases when func calling without params by @Tavish9 in #2936
- [sglang] feat: add dapo multi-turn as alternative baseline by @zhaochenyang20 in #2952
- [megatron] fix: retain MLA config in mcore config converter by @Yangruipis in #2933
- [ci] fix: limit e2e_one_step_off_policy timeout by @ETOgaosion in #2964
- [rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES by @Crispig in #2967
- [doc] fix: fix typo in docs/preparation/prepare_data.rst by @nariaki3551 in #2957
- [misc] fix: fix DataProto getstate bug by @vermouth1992 in #2962
- [sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline by @none0663 in #2973
- [megatron] feat: Allow override optimizer config by @ETOgaosion in #2959
- [rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory by @chenhaiq in #2956
- [trainer] refactor: make main_ppo TaskRunner more modular by @eric-haibin-lin in #2885
- [data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' by @xylcbd in #2430
- [trainer] fix: only load memory in micro batch by @chenhaiq in #2908
- [misc] feat: Added: "tensorboard" to the requirements.txt by @RasulAlakbarli in #2900
- [ray, trainer] fix: fix working_dir when launching via uv by @Tavish9 in #2859
- [rollout,vllm] fix: max_num_seqs not take effect by @wuxibin89 in #2960
- [rollout,trainer] feat: offload param before wake up inference engine by @chenhaiq in #2977
- [doc] feat: update contact and news by @eric-haibin-lin in #2993
- [rollout] fix: avoid repeated multiplication by n for GRPO by @zdhNarsil in #2881
- [BREAKING] [perf] refactor: Profiler api refactor by @ETOgaosion in #2894
- [ray] fix: Fix function name in worker helper by @MrAta in #2868
- [model] fix: Handle flash_attn_supports_top_left_mask import for older transformers by @liqiongyu in #2985
- [trainer] feat: Specify apply_chat_template_kwargs from config by @HollowMan6 in #2998
- [rollout,vllm] feat: unify vllm and sglang method to async by @wuxibin89 in #2982
- [sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. by @HaochenYuan in #3007
- [perf] refactor: part 2 - Profiler ci test and fixes by @ETOgaosion in #3001
- [recipe] feat: add deepeyes recipe by @Maxwell-Jia in #2398
- [trainer] fix: reduce memory footprint by moving data to the device only in mini batch by @ji-huazhong in #3011
- [ci] fix: add
flash_attn_supports_top_left_maskto ignore list by @vermouth1992 in #3004 - [misc] feat: Support trackio by @yzlnew in #3017
- [perf] feat: Add rollout longtail observation metrics by @ETOgaosion in #3009
- [rollout] fix: Add soft node affinity to the agent loop workers by @JoostvDoorn in #3006
- [misc] chore: add gpu memory to deepseek script by @vermouth1992 in #3022
- [misc] chore: add GPU memory to names that train large models by @vermouth1992 in #3023
- [rollout] feat: add rollout config by @vermouth1992 in #3010
- [hardware, recipe] chore: support retool sft &update peft sft perf on npu by @zheliuyu in #3000
- [trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 by @techkang in #3019
- [recipe] fix: make LangGraph agent example runnable out-of-the-box by @philippnormann in #3029
- [ci] fix: try fix vllm test network issue by @ETOgaosion in #3031
- [fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom by @chenhaiq in #3020
- [doc] feat: Add VTool-R1 in the list of "awesome works using verl by @JingchengYang4 in #3036
- [misc] feat: add B200 and GB200 flops count by @vermouth1992 in #3041
- [rollout] feat: support over sampling rollout in SGLang Rollout by @zhaochenyang20 in #2929
- [doc] feat: add benchmark for deepseek by @techkang in #3046
- [rollout] feat: remove over-catched errors in SGLang rollout by @zhaochenyang20 in #3047
- [rollout,vllm] feat: support multi-modal in agent loop by @wuxibin89 in #3016
- [hardware] add flops count support for A3 device by @codemayq in #3053
- [trainer] fix: Remove redundant 'data.to()' codes by @A1waysBeenHere in #3051
- [BREAKING][rollout] feat: allow users pass all vllm/sglang engine args by @techkang in #3037
- [doc] fix: optimize ascend docs by @zheliuyu in #3063
- [ray] feat: remove worker group register center by @wuxibin89 in #3066
- [tool] fix: support non-ascii characters in search results by @Necolizer in #3044
- [ray] feat: add support for ray init kwargs by @Tavish9 in #3049
- [rollout] fix: vllm sleep level=2 bug by @techkang in #3082
- [fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig by @xxrjun in #3068
- [fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above by @weifengpy in #3072
- [sglang] fix: Qwen VLM Baseline by @zhaochenyang20 in #3083
- Update ray_trainer.py by @zlH518 in #3092
- [sglang] fix: Qwen VLM Baseline and sgl CI by @zhaochenyang20 in #3101
- [BREAKING] [rollout] feat: add a separate rollout worker by @vermouth1992 in #3071
- [recipe] fix: checkpoint in last step might be ignored to save in dapo by @syt-nju in #3034
- [fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP by @Tavish9 in #3012
- [perf] fix: fix npu profiler and add mstx UT by @tongtong0613 in #3052
- [doc] feat: Add Kimina-Prover-RL to awesome work by @thibautbar in #3108
- [misc] fix: fix precommit by @vermouth1992 in #3109
- [doc, perf] feat: add profiling doc by @ETOgaosion in #3113
- [trainer, worker] fix: setting old log probs equal to log probs for on policy training by @sahilpatelsp in #3119
- Fix python version by @Zzhiter in #3103
- [trainer] fix: only load memory in micro batch for megatron backend by @none0663 in #3106
- [rollout] feat: use rollout worker in MegatronWorker by @vermouth1992 in #3111
- [rollout] feat: compute reward score in agent loop by @wuxibin89 in #3055
- [ci] fix: fix precommit by @vermouth1992 in #3128
- [trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic by @none0663 in #3094
- [trainer] fix: move
testingout ofsteptimings by @Tialo in #3117 - [megatron] fix: add temperature parameter for logits scaling by @gxy-gxy in #3133
- [megatron] fix: mbridge save/load by @ETOgaosion in #2519
- [recipe] fix: make compute of
stepconsistent across all trainers by @Tialo in #3132 - [misc] fix: update peft's version in requirements-npu.txt by @zheliuyu in #3127
- [rollout] fix: numpy.int64 serialization error in Weave tracing during validation by @U-rara in #3112
- [sglang] feat: make sglang properly handle the
max_num_seqsconfiguration by @binary-husky in #3134 - [doc] feat: documentation Update, Ray Job Management Commands by @none0663 in #3131
- [ci] fix: model tests, transformers 4.55 has troubles with backward by @ETOgaosion in #3139
- [megatron] fix: fix megatron micro_batch_size assertion by @vermouth1992 in #3142
- [rollout] fix: KeyError "CPU" init agent loop workers by @KivenChen in #3141
- [fsdp, sglang] fix: Using Agreesive Empty Cache instead by @zhaochenyang20 in #3136
- [recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU by @ZLiao097 in #3146
- [rollout] feat: add response token logprobs in agent loop output by @wuxibin89 in #3151
- [fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks by @zhaochenyang20 in #3099
- [sglang] fix: fall back to default FSDP1 by @zhaochenyang20 in #3156
- [sglang] fix: remove unused padding in SGLang rollout by @PopSoda2002 in #3138
- [doc] fix: add qwen3moe-30b script and fix error in qwen3-235b by @chenhaiq in #3174
- [misc] feat: Add L40S and A40 flop counts by @fjosw in #3177
- [megatron] feat: set_expandable_segments for megatron by @vermouth1992 in #3181
- [WIP]: Setting DAPO baseline in SGLang multi-turn RL by @zhaochenyang20 in #3175
- [Optimize]Safe tool parameter access standardization in SGLang rollout by @Zzhiter in #3196
- [misc] feat: Add RL-PLUS to awesome work list by @YihongDong in #3197
- [rollout] feat: use dummy load_format when init AsyncServer by @vermouth1992 in #3184
- [rollout, sglang] feat: Add sync mode for bash by @PopSoda2002 in #3186
- [rollout] fix: add missing extra_reward_info to AgentLoopOuput by @wuxibin89 in #3194
- [doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script by @none0663 in #3198
- [env] fix: Improve License Check Hook Flexibility by @slimfrkha in #3202
- Revert "[rollout] feat: use dummy load_format when init AsyncServer" by @vermouth1992 in #3207
- [recipe] feat: Add Qwen3 30B MoE NPU recipe by @Shangwei-Li in #3189
- [perf] fix: fix profiler discrete mode unavailability by @tongtong0613 in #3188
- [docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 by @ETOgaosion in #3192
- [data] fix: update parquet_files type check to support multi-file input by @looput in #3211
- [rollout] fix: apply copy_to_local before init hf config by @ZornWang in #3204
- [doc] fix: fix a documentation typo for nsys by @davidmlw in #3214
- [trainer] refactor: PPO config validation fast fail by @slimfrkha in #3187
- [megatron] refactor: refactor MegatronPPOActor by @vermouth1992 in #3206
- [env, sglang] feat: Bump new sglang version to fix vlm OOM by @PopSoda2002 in #3216
- [ci] fix: fix type convergence check by @ETOgaosion in #3219
- [rollout] fix: Restore the parameter 'limit_images' in RolloutConfig by @sty-yyj in #3217
- [BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling by @yaof20 in #2953
- [doc] fix: fix slack invitation link by @eric-haibin-lin in #3230
- [trainer] fix: Unified use of the def to() in Class DataProto by @A1waysBeenHere in #3227
- [fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon by @kfallah in #3231
- [recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding by @Sirius-L1 in #3242
- [sglang] feat: add native sgl server by @ChangyiYang in #3090
- [megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM by @vermouth1992 in #3235
- [hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues by @ji-huazhong in #3222
- [ckpt] fix: TypeError when save VL model ckpt by @Maxwell-Jia in #3268
- [recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. by @ZLiao097 in #3252
- add gptoss grpo example script by @rich-junwang in #3212
- [worker] fix: Fix missing
rollout_log_probsargument in policy loss functions by @kAIto47802 in #3274 - [data] fix:
Nonehas no attributegetwhenextra_infoin Parquet is NaN by @Mighten in #3272 - [misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks by @Maxwell-Jia in #3280
- [training_utils] fix: allow empty image_key/video_key in rl dataset by @HollowMan6 in #3281
- [hardware] fix: update source in dockerfile.rocm by @yushengsu-thu in #3284
- [fsdp] feat: add NPU fusion kernels for Qwen3 MoE by @Shangwei-Li in #3221
- [fsdp, model] feat: support FSDP model engine by @vermouth1992 in #3270
- [rollout] feat: Refactor agentloop multiturn by @plutoZZZZ in #3171
- [perf] feat: add npu silu &expand the scope of patch models by @zheliuyu in #3260
- [doc] fix: add rStar2-Agent as work using verl by @feifeibear in #3298
- [BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop by @echo-rain in #3152
- [ci, model] feat: add qwen3 CI testcase on ASCEND NPU by @tardis-key in #3300
- [single_controller, ray] fix: shut ray down after initializes it by @lantian7 in #3317
- [rollout] feat: deprecate all rollout sharding manager by @wuxibin89 in #3285
- [trainer] fix:
ray.state.available_resources_per_nodeis deprecated by @HollowMan6 in #3313 - [trainer] fix: Correct off-by-one error in SFT loss mask slicing by @BlankCheng in #3287
- [doc] feat: Adding PACS to the Awesome work by @Geaming2002 in #3327
- [model] feat: polish model engine by @vermouth1992 in #3321
- [misc] feat: create issue template for verl by @techkang in #3330
- [doc]Update README.md, add related works by @The-Hierophant in #3331
- [recipe] fix: bugfix of refactor omissions by @baymax591 in #3328
- [training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated by @HollowMan6 in #3314
- [recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) by @zpqiu in #3336
- [deployment, doc] feat: Add SkyPilot integration examples by @alex000kim in #3333
- [doc] fix: Update skypilot_examples.rst by @vermouth1992 in #3344
- [trainer] feat: support sft_trainer with model engine by @vermouth1992 in #3341
- [model] feat: support ByteDance Seed-OSS 36B model by @chenhaiq in #3347
- [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @baymax591 in #3345
- [rollout, vllm, sglang] fix: allow user customization of
repetition_penaltyto avoid watchdog timeout during GRPO rollout by @Mighten in #3309 - [doc] fix: fix typo in skypilot_examples.rst by @alex000kim in #3368
- [trainer] feat: add CI for accuracy alignment of SFT trainer with model engine by @vermouth1992 in #3363
- [worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode by @yyDing1 in #3352
- [training_utils] fix: stop using
mathnaming under reward score" by @HollowMan6 in #3378 - [misc] fix: set default value of ETP to 1 by @vermouth1992 in #3371
- [deployment] Fix deepseek671B grpo script by @HaochenYuan in #3383
- [trainer] fix: Fix ClearML logging by @Tialo in #3384
- [model, megatron] feat: Add glm air support and make new model directly use mbridge by @ETOgaosion in #3359
- [ci] fix: cpu unit test, etp config breaking change by @ETOgaosion in #3390
- [model] refactor: polishing FSDP model engine by @vermouth1992 in #3394
- [model] feat: polish megatron engine by @vermouth1992 in #3401
- [trainer] fix: avoid loading duplicated custom reward function to fix issue #3150 by @fshp971 in #3404
- [doc] fix: edit one step off policy readme with original work by @mnoukhov in #3414
- [ci] refactor: add ci test for refactored reward worker and add some args to GenRM config by @yyDing1 in #3385
- [vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU by @Roseisrosie in #3355
- [rollout] chore: Add enable_prefix_caching into config by @wlf-darkmatter in #3395
- [rollout] fix: raise error if processing multimodal data without vlm processor by @techkang in #3370
- [recipe] fix: Add gts argument for recipe _dump_generations by @moehanabi in #3348
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 by @vermouth1992 in #2733
- [fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset by @ccclyu in #3417
- [model] feat: replace DataProto with TensorDict in engine by @vermouth1992 in #3422
- [tool] feat: support local gsm8k dataset in example/data_preprocess by @tardis-key in #3362
- [worker] refactor: move the implementation of rm to workers.roles and polish by @yyDing1 in #3423
- [doc] feat: add SimpleVLA-RL link in readme by @feifeibear in #3433
- [doc] fix: table column in document by @chenhaiq in #3430
- [worker, sglang] feat: support generative reward model (server mode) by @yyDing1 in #3441
- [trainer] feat: VL support freeze vision model by @maijia-cwh in #3178
- [trainer] fix: Loss calculations for grad accumulation steps by @puneeshkhanna in #3332
- [worker] fix: respect free_cache_engine flag by @HollowMan6 in #3442
- [ci] feat: move more tests to volcano engine by @vermouth1992 in #3455
- [sglang, tool] fix: fix text only bug by @nanjiangwill in #3448
- [trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU by @ji-huazhong in #2924
- [trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora by @techkang in #3461
- [model] feat: Add
Apertusby @EduardDurech in #3295 - [model] feat: add FSDP/Megatron critic worker with model engine by @vermouth1992 in #3439
- [megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU by @wlf-darkmatter in #3203
- [sglang, rollout] feat: enable token-in-token-out for SGLang engine by @nanjiangwill in #2759
- [model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU by @tardis-key in #3310
- [ci] chore: add codeowner by @tardis-key in #3473
- [rollout] fix: make agent loop reward worker thread-safe by @wuxibin89 in #3454
- [perf, megatron] chore: bind NUMA by @conver334 in #3471
- [ray] refactor: Accelerate Tensor serialization by converting to np.ndarray by @baymax591 in #3425
- [1/N][rollout] feat: support vllm/sglang native http server by @wuxibin89 in #3456
- [perf] fix: Init some attrs earlier in Profiler by @moehanabi in #3482
- [perf, megatron] fix: bugfix if nvml can not import by @baymax591 in #3490
- [ray, single_controller] refactor: Accelerate ray.put with thread by @baymax591 in #3495
- [model] fix: fix device by @vermouth1992 in #3500
- [training_utils] refactor: extract checkpoint handler into a separate file for reuse by @vermouth1992 in #3505
- [recipe] feat: Add qwen2.5-7b DAPO NPU example script by @FightingZhen in #3501
- [model, ci] feat: add qwen3-8b ppo script on ASCEND NPU by @xvxuopop in #3502
- [data] feat: support customizable loss mask in multi-turn sft dataset by @vermouth1992 in #3507
- [model] fix: refactor qwen2vl patches & support no-image input for fsdp by @hiyouga in #3496
- [megatron] Add TIS support to megatron backend by @sharonyu-115 in #3513
- [model] fix: qwen2vl for transformers 4.52.* by @hiyouga in #3524
- [doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst by @tardis-key in #3514
- [doc] chore: Update ascend quick start document by @FightingZhen in #3527
- [doc] chore: Update owners for ascend_tutorial documents by @FightingZhen in #3528
- [worker] fix: get all
multi_modal_inputskeys with in a microbatch by @HollowMan6 in #3315 - [ci] feat: using local dataset to avoid network issue by @vermouth1992 in #3533
- [Megatron] fix: compatible to mcore0.15 by @ISEEKYAN in #3534
- [chore] fix typo by @1195343015 in #3535
- [ci] feat: fix more ci by @vermouth1992 in #3537
- [recipe] fix: Fix main_spin.py bugs by @NuoJohnChen in #3543
- [model] feat: support parameter generator for model engine by @vermouth1992 in #3529
- [megatron] chore: add a docker image for with mcore0.15 and TE2.7 by @ISEEKYAN in #3540
- [ci] feat: update ci by @vermouth1992 in #3552
- [recipe] fix: spin fsdp_workers.py bugs by @NuoJohnChen in #3544
- [recipe] fix: init self.model_config in fsdp worker of one-step-off policy by @zlwang-cs in #3556
- [docker] feat: dockerfile rocm7 initial commit by @vickytsang in #3547
- [trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon by @piood in #3541
- [ci] fix: fix more ci by pin transformers version by @vermouth1992 in #3582
- [trainer] refactor: move rollout log to inheritable trainer by @ccclyu in #3576
- [CI] chore: Update e2e_ascend CI config by @FightingZhen in #3532
- [misc] feat: remove redundant default params by @techkang in #3577
- [megatron] fix: fix bug when holding empty parameters with custom pipeline layout by @HaochenYuan in #3565
- [ci] fix: fix e2e_sppo ci by @vermouth1992 in #3587
- [sglang] fix: Support SGLang>=0.5.2 by @EduardDurech in #3526
- [megatron] feat: use flash as default attention_backend by @ISEEKYAN in #3578
- [doc] fix: add faq doc to avoid vllm issue 22103 by @chenhaiq in #3595
- [misc] chore: Update CODEOWNERS by @vermouth1992 in #3594
- [megatron] fix: revert megatron actor refactor by @vermouth1992 in #3553
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 by @houminz in #3567
- [algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup by @EduardDurech in #3555
- [CI] chore: reopen ppo test in e2e_ascend CI by @FightingZhen in #3588
- [trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 by @A1waysBeenHere in #3596
- [recipe] feat: CollabLLM integration for multiturn training by @Wuyxin in #3574
- [doc] feat: add model engine doc by @vermouth1992 in #3611
- [ci] chore: Use local dataset and models in e2e_ascend CI by @FightingZhen in #3601
- [rollout] fix: remove code responsible for tool response duplication by @mgilmore-relace in #3604
- [doc] fix: fix doc by @vermouth1992 in #3614
- [worker] fix: correctly determine is_vlm_model if sp > 1 by @HollowMan6 in #3282
- [rollout, tool] feat: export rollout rewards to total rewards by @Tavish9 in #3563
- [ci] fix: use local models/configs/datasets to increase stability by @vermouth1992 in #3616
- [ci] fix: fix sanity ci by @vermouth1992 in #3626
- [doc] feat: Adding Table-R1 to the Awesome work by @FlowRays in #3627
- [ci] feat: upgrade sglang to 0.5.2 by @wuxibin89 in #3613
- [ci] feat: increase timeout of e2e_sft by @vermouth1992 in #3630
- [tool] feat: support load local datasets when preparing datasets by @ji-huazhong in #3621
- [CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error by @ji-huazhong in #3631
- [recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation by @ZhichaoWang970201 in #3369
- [megatron] feat: add mindspeed engine and support sft by @ji-huazhong in #3599
- [rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas by @yyDing1 in #3625
- [trainer] fix: Ref to #3596. More import fix for transformers version higher than 4.55.0 by @A1waysBeenHere in #3608
- [2/N][rollout] feat: support vllm/sglang DP+EP in server mode by @wuxibin89 in #3530
- [model] feat: add glm4v by @lambertwjh in #3291
- [algo, perf] feat: Vectorize GRPO Advantage Estimator - 13~26x Speedup by @CedricHwong in #3635
- [megatron, worker] fix: use
extract_multi_modal_inputsmethod for handlingmulti_modal_inputsby @HollowMan6 in #3641 - [rollout,vllm] fix: Add LoRA Loading to Async vLLM by @kfallah in #3639
- [megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 by @HollowMan6 in #3640
- [recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo by @jiaqiw09 in #3644
- [sglang] feat: adapt for sglang+verl by @lbk-sys in #3506
- [model] fix: stuck issue with mixed text-image data by @HollowMan6 in #3670
- [ci] fix: disable workflows with self-host machines to run on fork by @HollowMan6 in #3677
- [rollout] fix: qwen2_vl position_ids shape mismatch by @m-Just in #3653
- [model] feat: add qwen3vl by @hiyouga in #3681
- [ci] fix: merge pre-commit-full into pre-commit by @HollowMan6 in #3684
- [ci] fix: fix checkpoint converter ci by @vermouth1992 in #3685
- [model] fix: qwen3vl patch by @hiyouga in #3686
- [trainer] feat: Enabled fused adamw by @puneeshkhanna in #3692
- [worker] fix: support for vllm V0 deprecation version by @HollowMan6 in #3687
- [rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout by @HJSang in #3661
- [rollout] fix: add batch_data_id default value check in AsyncRolloutRequest by @pandengyao in #3657
- [rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config by @mgilmore-relace in #3675
- [rollout] feat: support async mode for multimodal data inference by @xichengpro in #3702
- [worker] refactor: Add
kwargsto checkpoint related functions inBaseEngineand its subclasses by @hongpeng-guo in #3662 - [worker] fix: create a new event loop if none exists by @ji-huazhong in #3703
- [recipe] fix: move all collabllm files into recipe directory by @chenhaiq in #3706
- [megatron, model] fix: VLMs using mbridge together with fused kernels by @HollowMan6 in #3700
- [data] fix: merge metrics from all workers in DataProto.concat() by @szrlee in #3699
- [misc] fix: model reassign to inner model in vllm patch file by @ccclyu in #3668
- [misc] fix: Allow HF model ID with
use_shmby @EduardDurech in #3663 - [megatron] feat: add ascend megatron merge support by @jiaqiw09 in #3722
- [fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync by @pourion in #3712
- [rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models by @HJSang in #3705
- [rollout] feat: add default agent name for agent loop by @wuxibin89 in #3716
- [rollout] chore: Misc changes for extending internal compatibility by @pengwu22 in #3701
- [misc] feat: support build DataProto from TensordDict by @ji-huazhong in #3726
- [misc] feat: support offline generation with server mode by @vermouth1992 in #3732
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 by @houminz in #3600
- [ci] feat: increase sft e2e time by @vermouth1992 in #3738
- [model] fix: qwen3vl training stuck with mixed text-image data by @HollowMan6 in #3734
- [model] fix: qwen3vl models shape mismatch error with SP by @HollowMan6 in #3735
- [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type by @bxyang in #3739
- [BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation by @szrlee in #3694
- [rollout] refactor: rename "clip" mode back to "mask" mode by @szrlee in #3750
- [sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy by @KAMiPan in #3531
- Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. by @CSfufu in #3755
- Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl by @sanxing-chen in #3756
New Contributors
- @NKcqx made their first contribution in #2685
- @HelloWorld686 made their first contribution in #2484
- @wizardlancet made their first contribution in #2726
- @Tingberer made their first contribution in #2577
- @MikeDean2367 made their first contribution in #2768
- @kibitzing made their first contribution in #2777
- @MzeroMiko made their first contribution in #2795
- @clearhanhui made their first contribution in #2805
- @panf2333 made their first contribution in #2849
- @chi2liu made their first contribution in #2827
- @wantbook-book made their first contribution in #2666
- @Qiao0124 made their first contribution in #2476
- @techkang made their first contribution in #2883
- @looput made their first contribution in #2353
- @EasonZhong668 made their first contribution in #2884
- @TomQunChao made their first contribution in #2808
- @ethen8181 made their first contribution in #2050
- @wlf-darkmatter made their first contribution in #2602
- @nariaki3551 made their first contribution in #2957
- @xylcbd made their first contribution in #2430
- @RasulAlakbarli made their first contribution in #2900
- @zdhNarsil made their first contribution in #2881
- @MrAta made their first contribution in #2868
- @liqiongyu made their first contribution in #2985
- @HaochenYuan made their first contribution in #3007
- @Maxwell-Jia made their first contribution in #2398
- @philippnormann made their first contribution in #3029
- @JingchengYang4 made their first contribution in #3036
- @codemayq made their first contribution in #3053
- @A1waysBeenHere made their first contribution in #3051
- @xxrjun made their first contribution in #3068
- @zlH518 made their first contribution in #3092
- @syt-nju made their first contribution in #3034
- @sahilpatelsp made their first contribution in #3119
- @Zzhiter made their first contribution in #3103
- @Tialo made their first contribution in #3117
- @gxy-gxy made their first contribution in #3133
- @binary-husky made their first contribution in #3134
- @KivenChen made their first contribution in #3141
- @ZLiao097 made their first contribution in #3146
- @fjosw made their first contribution in #3177
- @YihongDong made their first contribution in #3197
- @slimfrkha made their first contribution in #3202
- @Shangwei-Li made their first contribution in #3189
- @ZornWang made their first contribution in #3204
- @sty-yyj made their first contribution in #3217
- @yaof20 made their first contribution in #2953
- @kfallah made their first contribution in #3231
- @Sirius-L1 made their first contribution in #3242
- @ChangyiYang made their first contribution in #3090
- @rich-junwang made their first contribution in #3212
- @kAIto47802 made their first contribution in #3274
- @Mighten made their first contribution in #3272
- @echo-rain made their first contribution in #3152
- @lantian7 made their first contribution in #3317
- @BlankCheng made their first contribution in #3287
- @baymax591 made their first contribution in #3328
- @alex000kim made their first contribution in #3333
- @fshp971 made their first contribution in #3404
- @mnoukhov made their first contribution in #3414
- @Roseisrosie made their first contribution in #3355
- @moehanabi made their first contribution in #3348
- @maijia-cwh made their first contribution in #3178
- @puneeshkhanna made their first contribution in #3332
- @EduardDurech made their first contribution in #3295
- @xvxuopop made their first contribution in #3502
- @sharonyu-115 made their first contribution in #3513
- @1195343015 made their first contribution in #3535
- @NuoJohnChen made their first contribution in #3543
- @zlwang-cs made their first contribution in #3556
- @piood made their first contribution in #3541
- @houminz made their first contribution in #3567
- @Wuyxin made their first contribution in #3574
- @mgilmore-relace made their first contribution in #3604
- @FlowRays made their first contribution in #3627
- @ZhichaoWang970201 made their first contribution in #3369
- @lambertwjh made their first contribution in #3291
- @CedricHwong made their first contribution in #3635
- @jiaqiw09 made their first contribution in #3644
- @lbk-sys made their first contribution in #3506
- @m-Just made their first contribution in #3653
- @HJSang made their first contribution in #3661
- @pandengyao made their first contribution in #3657
- @szrlee made their first contribution in #3699
- @pourion made their first contribution in #3712
- @pengwu22 made their first contribution in #3701
- @bxyang made their first contribution in #3739
- @KAMiPan made their first contribution in #3531
- @CSfufu made their first contribution in #3755
- @sanxing-chen made their first contribution in #3756
Full Changelog: v0.5.0...v0.6.0
What's Changed
- [cfg] refactor: add ActorConfig, EngineConfig, and ActorWorker unit test, refactor validation code by @eric-haibin-lin in #2621
- [ci] test: add CriticWorker unit test, make some util CPU friendly by @eric-haibin-lin in #2717
- [ray] feat: RayWorkerGroup support set worker env by @NKcqx in #2685
- [sglang] fix: Adding strict naming sanity for sglang by @zhaochenyang20 in #2719
- [misc] chore: bump main branch version to v0.5.0.dev by @eric-haibin-lin in #2718
- [megatron] fix: resolve backward propagation error in megatron_actor due to shared logits tensor in-place modification by @HelloWorld686 in #2484
- [tool] fix: geo3k create return by @nanjiangwill in #2714
- [doc] feat: Add agent-lightning in the list of "awesome works using verl by @wizardlancet in #2726
- [ci] fix: checkpoint_convertor ci miss a hf model download by @ETOgaosion in #2730
- [recipe] chore: add retool training script by @wuxibin89 in #2732
- [ci] fix: release ascend test time, fix one step off-policy CI by @ETOgaosion in #2731
- [doc] feat: add resizable sidebar and improve layout by @Tingberer in #2577
- [docker] feat: upgrade to torch 2.7, sglang 0.4.8 by @ETOgaosion in #2617
- [megatron] feat: a bunch of optimzation on vram, sequence packing by @ISEEKYAN in #2678
- [CI] feat: add
mypyto pre-commit by @frrad in #2614 - [doc] style: change resize handle from gradient to plain color by @Tingberer in #2746
- refactor: Make sure to keep the type checking by @YeonwooSung in #2634
- [rollout] feat: remove chat scheduler by @wuxibin89 in #2725
- [perf] feat: add optional role selection in discrete mode for NPU Profiler by @tongtong0613 in #2750
- [doc] feat: add retool blog by @eric-haibin-lin in #2761
- [algo] refactor: don't special-case
compute_policy_lossby @frrad in #2701 - [BREAKING] [rollout] chore: remove default rollout selection by @vermouth1992 in #2757
- [misc] fix: Handle N-D arrays and complex objects in union_numpy_dict by @MikeDean2367 in #2768
- [recipe] fix: fix retool SFT dataset by @vermouth1992 in #2764
- [doc] fix: fix typo in agentic RL documentation by @kibitzing in #2777
- [cfg] fix: fix failing rollout config test on main by @eric-haibin-lin in #2771
- [docker] feat: upgrade vllm to 0.9.1 by @ETOgaosion in #2747
- [recipe] fix: fix issue when running split ppo by @as12138 in #2745
- [recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process by @none0663 in #2739
- [recipe] feat: add QWen2.5-7b-instruct retool by @vermouth1992 in #2800
- [recipe] feat: @register_policy_loss("geo_mean"); Geometric-Mean Policy Optimization by @MzeroMiko in #2795
- [tool] fix: Typo fix -- Rename
to_openai_function_tool_schematoget_openai_tool_schemaby @wizeng23 in #2806 - [perf] feat: Padding before batch post-process in agent-loop to save time by @PopSoda2002 in #2773
- [vllm,rollout] fix: vllm rollout lock file permission by @clearhanhui in #2805
- [training_utils] fix: enforce 1D object array shape for non-tensor data in collate_fn by @kibitzing in #2741
- [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @leo-pony in #2782
- Revert "[recipe] feat: Add sleep/wakeup mode for gen rm vllm service and add tqdm showing process" by @ETOgaosion in #2813
- [algo] feat: add GSPO-token policy loss computation function by @0x404 in #2775
- [sglang] fix: support the configuration of attention_backend in sglang by @tardis-key in #2818
- [rollout] feat: pass all dataset fields to agent loop run by @wuxibin89 in #2810
- [docker] feat: Upgrade sglang 0.4.9 + transformers 4.53.2 by @ETOgaosion in #2794
- [sglang] fix: fix missing engine_kwargs by @vermouth1992 in #2823
- [perf, doc] feat: Add profiling continous steps in one database by @davidmlw in #2695
- [ci] fix: vllm no dataset by @ETOgaosion in #2831
- [tool] fix: load MCP tools in async rollout mode by @mathewjhan in #2821
- [rollout] fix: fix tool_agent_loop gsm8k task not use ground_truth in dataset by @vllbc in #2740
- [CI] feat: update npu image to vLLM-ascend-v0.7.3.post1+mindspeed0.12.1 by @Crispig in #2838
- [training_utils] feat: Support
assert_casefor sandbox fusion by @HollowMan6 in #2374 - [recipe] feat: support qwen3-8B/14B DAPO training on ASCEND NPU by @zhihe-wang in #2836
- [doc] feat: add verl multinode SkyPilot example by @panf2333 in #2849
- [megatron] feat: Add MindSpeed support on the NPU device by @CurryRice233 in #2707
- [misc] feat: optimize GRPO-family algorithms with torch.stack and improve tensor creation consistency by @chi2liu in #2827
- [fsdp] feat: optimize fsdp2 by @vermouth1992 in #2843
- [recipe] feat: modify dapo deepseek megatron script by @vermouth1992 in #2711
- [megatron] fix: remove the demising critic.model.enable_gradient_checkpointing flags in the scripts by @HollowMan6 in #2864
- [fsdp,megatron,sglang] feat: Accelerate and Simplify Update weights logic and bump SGLang to 0.4.9.post6 by @hebiao064 in #2720
- [ci] fix: fix fsdp test in transformers 4.54.1 by @vermouth1992 in #2874
- [trainer, hardware] chore: add pin_memory_device when pin_memory is enabled by @zheliuyu in #2871
- [data] feat: dump train/test example as JSON by @wantbook-book in #2666
- [misc] refactor: Add
AbstractRewardManagerabstract class by @frrad in #2763 - [doc] fix: Fix the role assignment error in the interaction demo file and doc. by @Qiao0124 in #2476
- [trainer, ci] fix: fix error variable in new engine impl and add ci test by @ShareLer in #2647
- [misc] feat: add nccl timeout configuration to fsdp workers by @shinytang6 in #2321
- [trainer] fix: move UID generation before batch processing for future conditioning support by @nanjiangwill in #2880
- [sglang] chore: bump transformer formers 4.54.0 and fix QWen VL issues by @hebiao064 in #2869
- [doc] fix: multi turn argument is not available by @techkang in #2883
- [tool, sglang] feat: add tool create info by @nanjiangwill in #2870
- [trainer] chore: Add ground truth data to generation dumps in RayPPOTrainer by @looput in #2353
- [ci] fix: retry type check on cpu by @ETOgaosion in #2887
- [fsdp, trainer] fix: save config parameters to wandb in SFT by @EasonZhong668 in #2884
- [misc] feat: support logging rollout prob vs. actor probs in multi-turn for debugging purpose, follow up of #1712 by @TomQunChao in #2808
- [FSDP] feat: Allows specifying a different reference model by @ethen8181 in #2050
- [rollout] feat: add rollout_skip to skip rollout by reusing previously generated sequences by @wlf-darkmatter in #2602
- [ray] feat: support directly register dispatch device mesh by @vermouth1992 in #2893
- [doc] fix: Specify rollout engine in quickstart.rst by @TonyLianLong in #2905
- [BREAKING] [ray, megatron] feat: remove RayMegatronWorker by @vermouth1992 in #2895
- [megatron] refactor: simplify module init in megatron_workers, extract common operations by @ETOgaosion in #2400
- [rollout, sglang] fix: fix encoding logic bug by @nanjiangwill in #2901
- [megatron] fix: qwen2vl megatron fused forward param bug by @Yangruipis in #2595
- [sglang] fix: remove unnecessary maybe_set_triton_cache_manager by @hebiao064 in #2926
- [misc] refactor: deprecate sharding manager (part 1) by @vermouth1992 in #2912
- [megatron] feat: support for pipeline layout with vpp in mcore 0.13.0 by @yzlnew in #2749
- [fsdp] fix: call reshard() to resolve no shard attribute by @weifengpy in #2941
- [megatron] chore: update example 671B script, no offline dist-ckpt needed any more by @ISEEKYAN in #2945
- [tool] feat: handle cases when func calling without params by @Tavish9 in #2936
- [sglang] feat: add dapo multi-turn as alternative baseline by @zhaochenyang20 in #2952
- [megatron] fix: retain MLA config in mcore config converter by @Yangruipis in #2933
- [ci] fix: limit e2e_one_step_off_policy timeout by @ETOgaosion in #2964
- [rollout] fix: Fix local rank binding issue when setting RAY_EXPERIMENTAL_NOSET_ASCEND_RT_VISIBLE_DEVICES by @Crispig in #2967
- [doc] fix: fix typo in docs/preparation/prepare_data.rst by @nariaki3551 in #2957
- [misc] fix: fix DataProto getstate bug by @vermouth1992 in #2962
- [sglang] fix: Fix No command 'hf' found for dapo multi-turn as alternative baseline by @none0663 in #2973
- [megatron] feat: Allow override optimizer config by @ETOgaosion in #2959
- [rollout] feat: add cudagraph_capture_sizes option to customize cuda graph memory by @chenhaiq in #2956
- [trainer] refactor: make main_ppo TaskRunner more modular by @eric-haibin-lin in #2885
- [data] fix: fix bug of '_io.BytesIO' object has no attribute 'startswith' by @xylcbd in #2430
- [trainer] fix: only load memory in micro batch by @chenhaiq in #2908
- [misc] feat: Added: "tensorboard" to the requirements.txt by @RasulAlakbarli in #2900
- [ray, trainer] fix: fix working_dir when launching via uv by @Tavish9 in #2859
- [rollout,vllm] fix: max_num_seqs not take effect by @wuxibin89 in #2960
- [rollout,trainer] feat: offload param before wake up inference engine by @chenhaiq in #2977
- [doc] feat: update contact and news by @eric-haibin-lin in #2993
- [rollout] fix: avoid repeated multiplication by n for GRPO by @zdhNarsil in #2881
- [BREAKING] [perf] refactor: Profiler api refactor by @ETOgaosion in #2894
- [ray] fix: Fix function name in worker helper by @MrAta in #2868
- [model] fix: Handle flash_attn_supports_top_left_mask import for older transformers by @liqiongyu in #2985
- [trainer] feat: Specify apply_chat_template_kwargs from config by @HollowMan6 in #2998
- [rollout,vllm] feat: unify vllm and sglang method to async by @wuxibin89 in #2982
- [sglang]fix: Reduce memory footprint during rollout by adding load_grad=False when loading megatron weights. by @HaochenYuan in #3007
- [perf] refactor: part 2 - Profiler ci test and fixes by @ETOgaosion in #3001
- [recipe] feat: add deepeyes recipe by @Maxwell-Jia in #2398
- [trainer] fix: reduce memory footprint by moving data to the device only in mini batch by @ji-huazhong in #3011
- [ci] fix: add
flash_attn_supports_top_left_maskto ignore list by @vermouth1992 in #3004 - [misc] feat: Support trackio by @yzlnew in #3017
- [perf] feat: Add rollout longtail observation metrics by @ETOgaosion in #3009
- [rollout] fix: Add soft node affinity to the agent loop workers by @JoostvDoorn in #3006
- [misc] chore: add gpu memory to deepseek script by @vermouth1992 in #3022
- [misc] chore: add GPU memory to names that train large models by @vermouth1992 in #3023
- [rollout] feat: add rollout config by @vermouth1992 in #3010
- [hardware, recipe] chore: support retool sft &update peft sft perf on npu by @zheliuyu in #3000
- [trainer,rollout,doc] feat: reduce minimum gpus to 96 for deepseek-v3 by @techkang in #3019
- [recipe] fix: make LangGraph agent example runnable out-of-the-box by @philippnormann in #3029
- [ci] fix: try fix vllm test network issue by @ETOgaosion in #3031
- [fsdp] fix: set _set_allocator_settings to True to avoid fsdp2 oom by @chenhaiq in #3020
- [doc] feat: Add VTool-R1 in the list of "awesome works using verl by @JingchengYang4 in #3036
- [misc] feat: add B200 and GB200 flops count by @vermouth1992 in #3041
- [rollout] feat: support over sampling rollout in SGLang Rollout by @zhaochenyang20 in #2929
- [doc] feat: add benchmark for deepseek by @techkang in #3046
- [rollout] feat: remove over-catched errors in SGLang rollout by @zhaochenyang20 in #3047
- [rollout,vllm] feat: support multi-modal in agent loop by @wuxibin89 in #3016
- [hardware] add flops count support for A3 device by @codemayq in #3053
- [trainer] fix: Remove redundant 'data.to()' codes by @A1waysBeenHere in #3051
- [BREAKING][rollout] feat: allow users pass all vllm/sglang engine args by @techkang in #3037
- [doc] fix: optimize ascend docs by @zheliuyu in #3063
- [ray] feat: remove worker group register center by @wuxibin89 in #3066
- [tool] fix: support non-ascii characters in search results by @Necolizer in #3044
- [ray] feat: add support for ray init kwargs by @Tavish9 in #3049
- [rollout] fix: vllm sleep level=2 bug by @techkang in #3082
- [fsdp] fix: add missing mixed precision configuration to FSDPEngineConfig by @xxrjun in #3068
- [fsdp] fix: patch fsdp2 to support hf transformer==4.54.0 and above by @weifengpy in #3072
- [sglang] fix: Qwen VLM Baseline by @zhaochenyang20 in #3083
- Update ray_trainer.py by @zlH518 in #3092
- [sglang] fix: Qwen VLM Baseline and sgl CI by @zhaochenyang20 in #3101
- [BREAKING] [rollout] feat: add a separate rollout worker by @vermouth1992 in #3071
- [recipe] fix: checkpoint in last step might be ignored to save in dapo by @syt-nju in #3034
- [fsdp, trainer, ckpt] feat: support custom model init and merging for FSDP by @Tavish9 in #3012
- [perf] fix: fix npu profiler and add mstx UT by @tongtong0613 in #3052
- [doc] feat: Add Kimina-Prover-RL to awesome work by @thibautbar in #3108
- [misc] fix: fix precommit by @vermouth1992 in #3109
- [doc, perf] feat: add profiling doc by @ETOgaosion in #3113
- [trainer, worker] fix: setting old log probs equal to log probs for on policy training by @sahilpatelsp in #3119
- Fix python version by @Zzhiter in #3103
- [trainer] fix: only load memory in micro batch for megatron backend by @none0663 in #3106
- [rollout] feat: use rollout worker in MegatronWorker by @vermouth1992 in #3111
- [rollout] feat: compute reward score in agent loop by @wuxibin89 in #3055
- [ci] fix: fix precommit by @vermouth1992 in #3128
- [trainer] fix: only load memory in micro batch for compute_log_prob, compute_values and update_critic by @none0663 in #3094
- [trainer] fix: move
testingout ofsteptimings by @Tialo in #3117 - [megatron] fix: add temperature parameter for logits scaling by @gxy-gxy in #3133
- [megatron] fix: mbridge save/load by @ETOgaosion in #2519
- [recipe] fix: make compute of
stepconsistent across all trainers by @Tialo in #3132 - [misc] fix: update peft's version in requirements-npu.txt by @zheliuyu in #3127
- [rollout] fix: numpy.int64 serialization error in Weave tracing during validation by @U-rara in #3112
- [sglang] feat: make sglang properly handle the
max_num_seqsconfiguration by @binary-husky in #3134 - [doc] feat: documentation Update, Ray Job Management Commands by @none0663 in #3131
- [ci] fix: model tests, transformers 4.55 has troubles with backward by @ETOgaosion in #3139
- [megatron] fix: fix megatron micro_batch_size assertion by @vermouth1992 in #3142
- [rollout] fix: KeyError "CPU" init agent loop workers by @KivenChen in #3141
- [fsdp, sglang] fix: Using Agreesive Empty Cache instead by @zhaochenyang20 in #3136
- [recipe] feat: support qwen2.5-32B DAPO training script on ASCEND NPU by @ZLiao097 in #3146
- [rollout] feat: add response token logprobs in agent loop output by @wuxibin89 in #3151
- [fsdp, trainer, tool] feat: add memory snapshot & visualization support for debugging GPU memory leaks by @zhaochenyang20 in #3099
- [sglang] fix: fall back to default FSDP1 by @zhaochenyang20 in #3156
- [sglang] fix: remove unused padding in SGLang rollout by @PopSoda2002 in #3138
- [doc] fix: add qwen3moe-30b script and fix error in qwen3-235b by @chenhaiq in #3174
- [misc] feat: Add L40S and A40 flop counts by @fjosw in #3177
- [megatron] feat: set_expandable_segments for megatron by @vermouth1992 in #3181
- [WIP]: Setting DAPO baseline in SGLang multi-turn RL by @zhaochenyang20 in #3175
- [Optimize]Safe tool parameter access standardization in SGLang rollout by @Zzhiter in #3196
- [misc] feat: Add RL-PLUS to awesome work list by @YihongDong in #3197
- [rollout] feat: use dummy load_format when init AsyncServer by @vermouth1992 in #3184
- [rollout, sglang] feat: Add sync mode for bash by @PopSoda2002 in #3186
- [rollout] fix: add missing extra_reward_info to AgentLoopOuput by @wuxibin89 in #3194
- [doc] fix: set use_dist_checkpointing to False for ref model in qwen3moe-30b script by @none0663 in #3198
- [env] fix: Improve License Check Hook Flexibility by @slimfrkha in #3202
- Revert "[rollout] feat: use dummy load_format when init AsyncServer" by @vermouth1992 in #3207
- [recipe] feat: Add Qwen3 30B MoE NPU recipe by @Shangwei-Li in #3189
- [perf] fix: fix profiler discrete mode unavailability by @tongtong0613 in #3188
- [docker] feat: update to vllm 0.10.0, mcore 0.13, transformers 4.55.4 by @ETOgaosion in #3192
- [data] fix: update parquet_files type check to support multi-file input by @looput in #3211
- [rollout] fix: apply copy_to_local before init hf config by @ZornWang in #3204
- [doc] fix: fix a documentation typo for nsys by @davidmlw in #3214
- [trainer] refactor: PPO config validation fast fail by @slimfrkha in #3187
- [megatron] refactor: refactor MegatronPPOActor by @vermouth1992 in #3206
- [env, sglang] feat: Bump new sglang version to fix vlm OOM by @PopSoda2002 in #3216
- [ci] fix: fix type convergence check by @ETOgaosion in #3219
- [rollout] fix: Restore the parameter 'limit_images' in RolloutConfig by @sty-yyj in #3217
- [BREAKING][vllm, fsdp] feat: add Rollout-Training Mismatch Fix -- Truncated importance sampling by @yaof20 in #2953
- [doc] fix: fix slack invitation link by @eric-haibin-lin in #3230
- [trainer] fix: Unified use of the def to() in Class DataProto by @A1waysBeenHere in #3227
- [fsdp, training_utils] Fix: LoRA w/ VLMs when Using Layered Summon by @kfallah in #3231
- [recipe] feat: Add InfiGUI-G1 recipe for MLLM GUI grounding by @Sirius-L1 in #3242
- [sglang] feat: add native sgl server by @ChangyiYang in #3090
- [megatron, model] feat: add MegatronEngine, MegatronEngineForCausalLM by @vermouth1992 in #3235
- [hardware] fix: Call synchronization when using the td.to("cpu") operation on NPU to avoid potential precision issues by @ji-huazhong in #3222
- [ckpt] fix: TypeError when save VL model ckpt by @Maxwell-Jia in #3268
- [recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. by @ZLiao097 in #3252
- add gptoss grpo example script by @rich-junwang in #3212
- [worker] fix: Fix missing
rollout_log_probsargument in policy loss functions by @kAIto47802 in #3274 - [data] fix:
Nonehas no attributegetwhenextra_infoin Parquet is NaN by @Mighten in #3272 - [misc] fix: use uid for grouping in validation to avoid prompt confusion in multimodal tasks by @Maxwell-Jia in #3280
- [training_utils] fix: allow empty image_key/video_key in rl dataset by @HollowMan6 in #3281
- [hardware] fix: update source in dockerfile.rocm by @yushengsu-thu in #3284
- [fsdp] feat: add NPU fusion kernels for Qwen3 MoE by @Shangwei-Li in #3221
- [fsdp, model] feat: support FSDP model engine by @vermouth1992 in #3270
- [rollout] feat: Refactor agentloop multiturn by @plutoZZZZ in #3171
- [perf] feat: add npu silu &expand the scope of patch models by @zheliuyu in #3260
- [doc] fix: add rStar2-Agent as work using verl by @feifeibear in #3298
- [BREAKING][rollout] feat: Added asynchronous reward model calculation in agent loop by @echo-rain in #3152
- [ci, model] feat: add qwen3 CI testcase on ASCEND NPU by @tardis-key in #3300
- [single_controller, ray] fix: shut ray down after initializes it by @lantian7 in #3317
- [rollout] feat: deprecate all rollout sharding manager by @wuxibin89 in #3285
- [trainer] fix:
ray.state.available_resources_per_nodeis deprecated by @HollowMan6 in #3313 - [trainer] fix: Correct off-by-one error in SFT loss mask slicing by @BlankCheng in #3287
- [doc] feat: Adding PACS to the Awesome work by @Geaming2002 in #3327
- [model] feat: polish model engine by @vermouth1992 in #3321
- [misc] feat: create issue template for verl by @techkang in #3330
- [doc]Update README.md, add related works by @The-Hierophant in #3331
- [recipe] fix: bugfix of refactor omissions by @baymax591 in #3328
- [training_utils] fix: Using a non-tuple sequence for multidimensional indexing is deprecated by @HollowMan6 in #3314
- [recipe] fix: (dapo_ray_trainer) use global_steps to determine is_last_step when resuming (gen_steps not restored) by @zpqiu in #3336
- [deployment, doc] feat: Add SkyPilot integration examples by @alex000kim in #3333
- [doc] fix: Update skypilot_examples.rst by @vermouth1992 in #3344
- [trainer] feat: support sft_trainer with model engine by @vermouth1992 in #3341
- [model] feat: support ByteDance Seed-OSS 36B model by @chenhaiq in #3347
- [vllm] fix: verl + vllm-ascend(version 0.9.1) running failed issue by @baymax591 in #3345
- [rollout, vllm, sglang] fix: allow user customization of
repetition_penaltyto avoid watchdog timeout during GRPO rollout by @Mighten in #3309 - [doc] fix: fix typo in skypilot_examples.rst by @alex000kim in #3368
- [trainer] feat: add CI for accuracy alignment of SFT trainer with model engine by @vermouth1992 in #3363
- [worker,sglang] refactor: deprecate fsdp/megatron reward model with server mode by @yyDing1 in #3352
- [training_utils] fix: stop using
mathnaming under reward score" by @HollowMan6 in #3378 - [misc] fix: set default value of ETP to 1 by @vermouth1992 in #3371
- [deployment] Fix deepseek671B grpo script by @HaochenYuan in #3383
- [trainer] fix: Fix ClearML logging by @Tialo in #3384
- [model, megatron] feat: Add glm air support and make new model directly use mbridge by @ETOgaosion in #3359
- [ci] fix: cpu unit test, etp config breaking change by @ETOgaosion in #3390
- [model] refactor: polishing FSDP model engine by @vermouth1992 in #3394
- [model] feat: polish megatron engine by @vermouth1992 in #3401
- [trainer] fix: avoid loading duplicated custom reward function to fix issue #3150 by @fshp971 in #3404
- [doc] fix: edit one step off policy readme with original work by @mnoukhov in #3414
- [ci] refactor: add ci test for refactored reward worker and add some args to GenRM config by @yyDing1 in #3385
- [vllm] fix: use VLLM_SLEEP_LEVEL=1 on ASCEND NPU by @Roseisrosie in #3355
- [rollout] chore: Add enable_prefix_caching into config by @wlf-darkmatter in #3395
- [rollout] fix: raise error if processing multimodal data without vlm processor by @techkang in #3370
- [recipe] fix: Add gts argument for recipe _dump_generations by @moehanabi in #3348
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 1 by @vermouth1992 in #2733
- [fsdp, recipe] feat: add grpo reward model example using HH-RLHF dataset by @ccclyu in #3417
- [model] feat: replace DataProto with TensorDict in engine by @vermouth1992 in #3422
- [tool] feat: support local gsm8k dataset in example/data_preprocess by @tardis-key in #3362
- [worker] refactor: move the implementation of rm to workers.roles and polish by @yyDing1 in #3423
- [doc] feat: add SimpleVLA-RL link in readme by @feifeibear in #3433
- [doc] fix: table column in document by @chenhaiq in #3430
- [worker, sglang] feat: support generative reward model (server mode) by @yyDing1 in #3441
- [trainer] feat: VL support freeze vision model by @maijia-cwh in #3178
- [trainer] fix: Loss calculations for grad accumulation steps by @puneeshkhanna in #3332
- [worker] fix: respect free_cache_engine flag by @HollowMan6 in #3442
- [ci] feat: move more tests to volcano engine by @vermouth1992 in #3455
- [sglang, tool] fix: fix text only bug by @nanjiangwill in #3448
- [trainer, fsdp, megatron] feat: Support one step off async rl on Ascend NPU by @ji-huazhong in #2924
- [trainer,rollout] fix: model weights will not be loaded when vllm_sleep_level=2 and using lora by @techkang in #3461
- [model] feat: Add
Apertusby @EduardDurech in #3295 - [model] feat: add FSDP/Megatron critic worker with model engine by @vermouth1992 in #3439
- [megatron,recipe] feat: support Qwen3-30B (MoE) DAPO training on ASCEND NPU by @wlf-darkmatter in #3203
- [sglang, rollout] feat: enable token-in-token-out for SGLang engine by @nanjiangwill in #2759
- [model] feat: add qwen3 grpo 8b/32b script on ASCEND NPU by @tardis-key in #3310
- [ci] chore: add codeowner by @tardis-key in #3473
- [rollout] fix: make agent loop reward worker thread-safe by @wuxibin89 in #3454
- [perf, megatron] chore: bind NUMA by @conver334 in #3471
- [ray] refactor: Accelerate Tensor serialization by converting to np.ndarray by @baymax591 in #3425
- [1/N][rollout] feat: support vllm/sglang native http server by @wuxibin89 in #3456
- [perf] fix: Init some attrs earlier in Profiler by @moehanabi in #3482
- [perf, megatron] fix: bugfix if nvml can not import by @baymax591 in #3490
- [ray, single_controller] refactor: Accelerate ray.put with thread by @baymax591 in #3495
- [model] fix: fix device by @vermouth1992 in #3500
- [training_utils] refactor: extract checkpoint handler into a separate file for reuse by @vermouth1992 in #3505
- [recipe] feat: Add qwen2.5-7b DAPO NPU example script by @FightingZhen in #3501
- [model, ci] feat: add qwen3-8b ppo script on ASCEND NPU by @xvxuopop in #3502
- [data] feat: support customizable loss mask in multi-turn sft dataset by @vermouth1992 in #3507
- [model] fix: refactor qwen2vl patches & support no-image input for fsdp by @hiyouga in #3496
- [megatron] Add TIS support to megatron backend by @sharonyu-115 in #3513
- [model] fix: qwen2vl for transformers 4.52.* by @hiyouga in #3524
- [doc] fix: Update Qwen3-30B-A3B info in ascend_quick_start.rst by @tardis-key in #3514
- [doc] chore: Update ascend quick start document by @FightingZhen in #3527
- [doc] chore: Update owners for ascend_tutorial documents by @FightingZhen in #3528
- [worker] fix: get all
multi_modal_inputskeys with in a microbatch by @HollowMan6 in #3315 - [ci] feat: using local dataset to avoid network issue by @vermouth1992 in #3533
- [Megatron] fix: compatible to mcore0.15 by @ISEEKYAN in #3534
- [chore] fix typo by @1195343015 in #3535
- [ci] feat: fix more ci by @vermouth1992 in #3537
- [recipe] fix: Fix main_spin.py bugs by @NuoJohnChen in #3543
- [model] feat: support parameter generator for model engine by @vermouth1992 in #3529
- [megatron] chore: add a docker image for with mcore0.15 and TE2.7 by @ISEEKYAN in #3540
- [ci] feat: update ci by @vermouth1992 in #3552
- [recipe] fix: spin fsdp_workers.py bugs by @NuoJohnChen in #3544
- [recipe] fix: init self.model_config in fsdp worker of one-step-off policy by @zlwang-cs in #3556
- [docker] feat: dockerfile rocm7 initial commit by @vickytsang in #3547
- [trainer,rollout] fix: ensure LoRA weights are loaded when vllm_sleep_level=2 and without using layerd_summon by @piood in #3541
- [ci] fix: fix more ci by pin transformers version by @vermouth1992 in #3582
- [trainer] refactor: move rollout log to inheritable trainer by @ccclyu in #3576
- [CI] chore: Update e2e_ascend CI config by @FightingZhen in #3532
- [misc] feat: remove redundant default params by @techkang in #3577
- [megatron] fix: fix bug when holding empty parameters with custom pipeline layout by @HaochenYuan in #3565
- [ci] fix: fix e2e_sppo ci by @vermouth1992 in #3587
- [sglang] fix: Support SGLang>=0.5.2 by @EduardDurech in #3526
- [megatron] feat: use flash as default attention_backend by @ISEEKYAN in #3578
- [doc] fix: add faq doc to avoid vllm issue 22103 by @chenhaiq in #3595
- [misc] chore: Update CODEOWNERS by @vermouth1992 in #3594
- [megatron] fix: revert megatron actor refactor by @vermouth1992 in #3553
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 2 by @houminz in #3567
- [algo, perf] feat: Vectorize RLOO Advantage Estimator - 20x Speedup by @EduardDurech in #3555
- [CI] chore: reopen ppo test in e2e_ascend CI by @FightingZhen in #3588
- [trainer] fix: Import flash attn utils for Transformers higher than 4.55.0 by @A1waysBeenHere in #3596
- [recipe] feat: CollabLLM integration for multiturn training by @Wuyxin in #3574
- [doc] feat: add model engine doc by @vermouth1992 in #3611
- [ci] chore: Use local dataset and models in e2e_ascend CI by @FightingZhen in #3601
- [rollout] fix: remove code responsible for tool response duplication by @mgilmore-relace in #3604
- [doc] fix: fix doc by @vermouth1992 in #3614
- [worker] fix: correctly determine is_vlm_model if sp > 1 by @HollowMan6 in #3282
- [rollout, tool] feat: export rollout rewards to total rewards by @Tavish9 in #3563
- [ci] fix: use local models/configs/datasets to increase stability by @vermouth1992 in #3616
- [ci] fix: fix sanity ci by @vermouth1992 in #3626
- [doc] feat: Adding Table-R1 to the Awesome work by @FlowRays in #3627
- [ci] feat: upgrade sglang to 0.5.2 by @wuxibin89 in #3613
- [ci] feat: increase timeout of e2e_sft by @vermouth1992 in #3630
- [tool] feat: support load local datasets when preparing datasets by @ji-huazhong in #3621
- [CI] fix: changed the model used in the PPO test case to Qwen2.5-0.5B to avoid the huggingface download error by @ji-huazhong in #3631
- [recipe] fix: Fix a Typo in One_Step_Off_Policy and Add async of Generative Reward Model in Response Generation by @ZhichaoWang970201 in #3369
- [megatron] feat: add mindspeed engine and support sft by @ji-huazhong in #3599
- [rollout] refactor: Update rollout and reward configs to reuse vllm/sglang replicas by @yyDing1 in #3625
- [trainer] fix: Ref to #3596. More import fix for transformers version higher than 4.55.0 by @A1waysBeenHere in #3608
- [2/N][rollout] feat: support vllm/sglang DP+EP in server mode by @wuxibin89 in #3530
- [model] feat: add glm4v by @lambertwjh in #3291
- [algo, perf] feat: Vectorize GRPO Advantage Estimator - 13~26x Speedup by @CedricHwong in #3635
- [megatron, worker] fix: use
extract_multi_modal_inputsmethod for handlingmulti_modal_inputsby @HollowMan6 in #3641 - [rollout,vllm] fix: Add LoRA Loading to Async vLLM by @kfallah in #3639
- [megatron, training_utils] fix: encoder pp is removed in mcore >= 0.14 by @HollowMan6 in #3640
- [recipe] feat: add multiturn scripts for vllm backend; fix progess bar in dapo by @jiaqiw09 in #3644
- [sglang] feat: adapt for sglang+verl by @lbk-sys in #3506
- [model] fix: stuck issue with mixed text-image data by @HollowMan6 in #3670
- [ci] fix: disable workflows with self-host machines to run on fork by @HollowMan6 in #3677
- [rollout] fix: qwen2_vl position_ids shape mismatch by @m-Just in #3653
- [model] feat: add qwen3vl by @hiyouga in #3681
- [ci] fix: merge pre-commit-full into pre-commit by @HollowMan6 in #3684
- [ci] fix: fix checkpoint converter ci by @vermouth1992 in #3685
- [model] fix: qwen3vl patch by @hiyouga in #3686
- [trainer] feat: Enabled fused adamw by @puneeshkhanna in #3692
- [worker] fix: support for vllm V0 deprecation version by @HollowMan6 in #3687
- [rollout,sglang] fix: get_tool_call_parser_type for gpt-oss models in sglang rollout by @HJSang in #3661
- [rollout] fix: add batch_data_id default value check in AsyncRolloutRequest by @pandengyao in #3657
- [rollout] fix: Add LoRA datatype based on rollout model type to the LoRA config by @mgilmore-relace in #3675
- [rollout] feat: support async mode for multimodal data inference by @xichengpro in #3702
- [worker] refactor: Add
kwargsto checkpoint related functions inBaseEngineand its subclasses by @hongpeng-guo in #3662 - [worker] fix: create a new event loop if none exists by @ji-huazhong in #3703
- [recipe] fix: move all collabllm files into recipe directory by @chenhaiq in #3706
- [megatron, model] fix: VLMs using mbridge together with fused kernels by @HollowMan6 in #3700
- [data] fix: merge metrics from all workers in DataProto.concat() by @szrlee in #3699
- [misc] fix: model reassign to inner model in vllm patch file by @ccclyu in #3668
- [misc] fix: Allow HF model ID with
use_shmby @EduardDurech in #3663 - [megatron] feat: add ascend megatron merge support by @jiaqiw09 in #3722
- [fsdp] fix: Handle dict type for per_tensor_param in LoRA weight sync by @pourion in #3712
- [rollout] feat: Add gpt-oss tool parser to enable agent loop training for gpt-oss models by @HJSang in #3705
- [rollout] feat: add default agent name for agent loop by @wuxibin89 in #3716
- [rollout] chore: Misc changes for extending internal compatibility by @pengwu22 in #3701
- [misc] feat: support build DataProto from TensordDict by @ji-huazhong in #3726
- [misc] feat: support offline generation with server mode by @vermouth1992 in #3732
- [misc] feat: prototype deprecate DataProto and replace with Tensordict: part 3 by @houminz in #3600
- [ci] feat: increase sft e2e time by @vermouth1992 in #3738
- [model] fix: qwen3vl training stuck with mixed text-image data by @HollowMan6 in #3734
- [model] fix: qwen3vl models shape mismatch error with SP by @HollowMan6 in #3735
- [fsdp,doc] refactor: rename warmup_style@FSDPOptimizerConfig -> lr_scheduler_type by @bxyang in #3739
- [BREAKING][rollout, trainer, algo] feat: comprehensive rollout importance sampling implementation by @szrlee in #3694
- [rollout] refactor: rename "clip" mode back to "mask" mode by @szrlee in #3750
- [sglang, recipe] feat: add SGLang as rollout engine for one-step-off-policy by @KAMiPan in #3531
- Add ARES and Revisual-R1 two awesome multimodal reasoning work using verl. by @CSfufu in #3755
- Add Meta-Bandit-LLM, a long-horizon multiturn interative awesome use case of verl by @sanxing-chen in #3756
- [trainer] feat: set interleave to False in dapo trainer by @jiaqiw09 in https://github.com/volcengine/verl/pull/3760
- [megatron] feat: support qwen3vl by @ISEEKYAN in https://github.com/volcengine/verl/pull/3763
- [recipe] fix: update readme for gmpo-trainer by @MzeroMiko in https://github.com/volcengine/verl/pull/3764
- [misc] feat: bump version to 0.6.0.dev by @vermouth1992 in https://github.com/volcengine/verl/pull/3768
New Contributors
- @NKcqx made their first contribution in #2685
- @HelloWorld686 made their first contribution in #2484
- @wizardlancet made their first contribution in #2726
- @Tingberer made their first contribution in #2577
- @MikeDean2367 made their first contribution in #2768
- @kibitzing made their first contribution in #2777
- @MzeroMiko made their first contribution in #2795
- @clearhanhui made their first contribution in #2805
- @panf2333 made their first contribution in #2849
- @chi2liu made their first contribution in #2827
- @wantbook-book made their first contribution in #2666
- @Qiao0124 made their first contribution in #2476
- @techkang made their first contribution in #2883
- @looput made their first contribution in #2353
- @EasonZhong668 made their first contribution in #2884
- @TomQunChao made their first contribution in #2808
- @ethen8181 made their first contribution in #2050
- @wlf-darkmatter made their first contribution in #2602
- @nariaki3551 made their first contribution in #2957
- @xylcbd made their first contribution in #2430
- @RasulAlakbarli made their first contribution in #2900
- @zdhNarsil made their first contribution in #2881
- @MrAta made their first contribution in #2868
- @liqiongyu made their first contribution in #2985
- @HaochenYuan made their first contribution in #3007
- @Maxwell-Jia made their first contribution in #2398
- @philippnormann made their first contribution in #3029
- @JingchengYang4 made their first contribution in #3036
- @codemayq made their first contribution in #3053
- @A1waysBeenHere made their first contribution in #3051
- @xxrjun made their first contribution in #3068
- @zlH518 made their first contribution in #3092
- @syt-nju made their first contribution in #3034
- @sahilpatelsp made their first contribution in #3119
- @Zzhiter made their first contribution in #3103
- @Tialo made their first contribution in #3117
- @gxy-gxy made their first contribution in #3133
- @binary-husky made their first contribution in #3134
- @KivenChen made their first contribution in #3141
- @ZLiao097 made their first contribution in #3146
- @fjosw made their first contribution in #3177
- @YihongDong made their first contribution in #3197
- @slimfrkha made their first contribution in #3202
- @Shangwei-Li made their first contribution in #3189
- @ZornWang made their first contribution in #3204
- @sty-yyj made their first contribution in #3217
- @yaof20 made their first contribution in #2953
- @kfallah made their first contribution in #3231
- @Sirius-L1 made their first contribution in #3242
- @ChangyiYang made their first contribution in #3090
- @rich-junwang made their first contribution in #3212
- @kAIto47802 made their first contribution in #3274
- @Mighten made their first contribution in #3272
- @echo-rain made their first contribution in #3152
- @lantian7 made their first contribution in #3317
- @BlankCheng made their first contribution in #3287
- @baymax591 made their first contribution in #3328
- @alex000kim made their first contribution in #3333
- @fshp971 made their first contribution in #3404
- @mnoukhov made their first contribution in #3414
- @Roseisrosie made their first contribution in #3355
- @moehanabi made their first contribution in #3348
- @maijia-cwh made their first contribution in #3178
- @puneeshkhanna made their first contribution in #3332
- @EduardDurech made their first contribution in #3295
- @xvxuopop made their first contribution in #3502
- @sharonyu-115 made their first contribution in #3513
- @1195343015 made their first contribution in #3535
- @NuoJohnChen made their first contribution in #3543
- @zlwang-cs made their first contribution in #3556
- @piood made their first contribution in #3541
- @houminz made their first contribution in #3567
- @Wuyxin made their first contribution in #3574
- @mgilmore-relace made their first contribution in #3604
- @FlowRays made their first contribution in #3627
- @ZhichaoWang970201 made their first contribution in #3369
- @lambertwjh made their first contribution in #3291
- @CedricHwong made their first contribution in #3635
- @jiaqiw09 made their first contribution in #3644
- @lbk-sys made their first contribution in #3506
- @m-Just made their first contribution in #3653
- @HJSang made their first contribution in #3661
- @pandengyao made their first contribution in #3657
- @szrlee made their first contribution in #3699
- @pourion made their first contribution in #3712
- @pengwu22 made their first contribution in #3701
- @bxyang made their first contribution in #3739
- @KAMiPan made their first contribution in #3531
- @CSfufu made their first contribution in #3755
- @sanxing-chen made their first contribution in #3756
Full Changelog: v0.5.0...v0.6.0