Skip to content

Commit f6e214a

Browse files
committed
add qwen3-32b grpo script on npu
1 parent ef43469 commit f6e214a

File tree

2 files changed

+53
-0
lines changed

2 files changed

+53
-0
lines changed

docs/ascend_tutorial/ascend_quick_start.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,8 @@ vllm & vllm-ascend
179179
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
180180
| GRPO | Qwen2.5-VL-32B-instruct | 0.79% | 0.568 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
181181
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
182+
| GRPO | Qwen3-32B | 0.64% | 0.690 | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
183+
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
182184
| DAPO | Qwen2.5-7B-instruct | 3.83% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
183185
+-----------+-------------------------+-------------+-------------------+-------------------+-------------------+--------------------------+
184186
| DAPO | Qwen2.5-32B | 3.42% | pending | FSDP | vllm-ascend | Atlas 200T A2 Box16 |
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
set -x
2+
3+
python3 -m verl.trainer.main_ppo \
4+
algorithm.adv_estimator=grpo \
5+
data.train_files=$HOME/data/gsm8k/train.parquet \
6+
data.val_files=$HOME/data/gsm8k/test.parquet \
7+
data.train_batch_size=1024 \
8+
data.max_prompt_length=2048 \
9+
data.max_response_length=2048 \
10+
data.filter_overlong_prompts=True \
11+
data.truncation='error' \
12+
data.shuffle=False \
13+
actor_rollout_ref.model.path=Qwen/Qwen3-32B \
14+
actor_rollout_ref.actor.optim.lr=1e-6 \
15+
actor_rollout_ref.model.use_remove_padding=True \
16+
actor_rollout_ref.actor.ulysses_sequence_parallel_size=4 \
17+
+actor_rollout_ref.actor.fsdp_config.mixed_precision.param_dtype=bf16 \
18+
+actor_rollout_ref.actor.fsdp_config.mixed_precision.reduce_dtype=bf16 \
19+
+actor_rollout_ref.actor.fsdp_config.mixed_precision.buffer_dtype=fp32 \
20+
actor_rollout_ref.actor.ppo_mini_batch_size=64 \
21+
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
22+
actor_rollout_ref.actor.use_kl_loss=True \
23+
actor_rollout_ref.actor.entropy_coeff=0 \
24+
actor_rollout_ref.actor.kl_loss_coef=0.001 \
25+
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
26+
actor_rollout_ref.model.enable_gradient_checkpointing=True \
27+
actor_rollout_ref.actor.fsdp_config.param_offload=True \
28+
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
29+
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
30+
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
31+
actor_rollout_ref.rollout.name=vllm \
32+
actor_rollout_ref.rollout.gpu_memory_utilization=0.7 \
33+
actor_rollout_ref.rollout.n=4 \
34+
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
35+
actor_rollout_ref.ref.fsdp_config.param_offload=True \
36+
actor_rollout_ref.actor.use_torch_compile=False \
37+
actor_rollout_ref.ref.use_torch_compile=False \
38+
actor_rollout_ref.rollout.enable_chunked_prefill=True \
39+
actor_rollout_ref.rollout.max_num_batched_tokens=32768 \
40+
algorithm.use_kl_in_reward=False \
41+
trainer.critic_warmup=0 \
42+
trainer.logger=['console','tensorboard'] \
43+
trainer.project_name='verl_grpo_example_gsm8k_fsdp' \
44+
trainer.experiment_name='qwen3_32b_fsdp' \
45+
trainer.n_gpus_per_node=16 \
46+
trainer.nnodes=2 \
47+
trainer.resume_from_path=checkpoints/ \
48+
trainer.save_freq=500 \
49+
trainer.test_freq=50 \
50+
trainer.total_epochs=50 \
51+
trainer.device=npu $@

0 commit comments

Comments
 (0)