Skip to content

Commit 40b05cb

Browse files
ZLiao097techkang
authored andcommitted
[recipe] fix: Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. (volcengine#3252)
### What does this PR do? Remove redundant parameters to resolve errors in the script caused by the latest Verl main branch. Related issue: [issue](volcengine#3248) ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Design & Code Changes Removed the two unnecessary parameters **dp_model_parallel_size** and **rollout_world_size** from the relevant files. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
1 parent d6abf2a commit 40b05cb

File tree

2 files changed

+0
-7
lines changed

2 files changed

+0
-7
lines changed

recipe/dapo/run_dapo_qwen2.5_32b_npu.sh

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -56,8 +56,6 @@ actor_ppo_max_token_len=$(((max_prompt_length + max_response_length) / sp_size))
5656
infer_ppo_max_token_len=$(((max_prompt_length + max_response_length) / sp_size))
5757
offload=True
5858
gen_tp=4
59-
gen_dp=1
60-
gen_world_size=$((NNODES * 16)) # nnodes* npus_in_per_node
6159
enable_chunked_prefill=True
6260

6361
ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
@@ -111,8 +109,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
111109
actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \
112110
actor_rollout_ref.rollout.gpu_memory_utilization=0.90 \
113111
actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \
114-
+actor_rollout_ref.rollout.dp_model_parallel_size=${gen_dp} \
115-
+actor_rollout_ref.rollout.rollout_world_size=${gen_world_size} \
116112
actor_rollout_ref.rollout.enable_chunked_prefill=${enable_chunked_prefill} \
117113
actor_rollout_ref.rollout.max_num_batched_tokens=$((max_prompt_length + max_response_length)) \
118114
actor_rollout_ref.rollout.temperature=${temperature} \
@@ -126,7 +122,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
126122
actor_rollout_ref.ref.fsdp_config.param_offload=${offload} \
127123
actor_rollout_ref.ref.ulysses_sequence_parallel_size=${sp_size} \
128124
actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \
129-
+actor_rollout_ref.rollout.enable_expert_parallel=False \
130125
reward_model.reward_manager=dapo \
131126
reward_model.overlong_buffer.enable=${enable_overlong_buffer} \
132127
reward_model.overlong_buffer.len=${overlong_buffer_len} \

recipe/dapo/run_dapo_qwen3_moe_30b_base_npu_fsdp.sh

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ offload=True
5858
recompute=True
5959
max_num_seqs=128
6060
gen_tp=2
61-
gen_world_size=$((NNODES * NPUS_PER_NODE)) # nnodes* npus_in_per_node
6261

6362

6463
ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
@@ -111,7 +110,6 @@ ray job submit --no-wait --runtime-env="${RUNTIME_ENV}" \
111110
actor_rollout_ref.actor.ulysses_sequence_parallel_size=${sp_size} \
112111
actor_rollout_ref.rollout.gpu_memory_utilization=0.8 \
113112
actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \
114-
+actor_rollout_ref.rollout.rollout_world_size=${gen_world_size} \
115113
actor_rollout_ref.rollout.enable_chunked_prefill=True \
116114
actor_rollout_ref.rollout.temperature=${temperature} \
117115
actor_rollout_ref.rollout.top_p=${top_p} \

0 commit comments

Comments
 (0)