Skip to content

Qwen3-30B-A3B-Instruct-2507微调报错 #8004

@Wu-dan

Description

@Wu-dan

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/cli/_megatron/sft.py", line 7, in
[rank2]: megatron_sft_main()
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 87, in megatron_sft_main
[rank2]: return MegatronSft(args).main()
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/llm/base.py", line 49, in main
[rank2]: result = self.run()
[rank2]: ^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 77, in run
[rank2]: self.trainer.train(train_dataset, val_dataset, data_collator)
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 1126, in train
[rank2]: pretrain(
[rank2]: File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 666, in pretrain
[rank2]: model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 504, in setup_model_and_optimizer
[rank2]: model, optimizer, opt_param_scheduler = self._origin_setup_model_and_optimizer(
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 1094, in setup_model_and_optimizer
[rank2]: model = get_model(model_provider_func, model_type)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 885, in get_model
[rank2]: model = build_model()
[rank2]: ^^^^^^^^^^^^^
[rank2]: File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 877, in build_model
[rank2]: model = model_provider_func(pre_process=pre_process, post_process=post_process)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 479, in new_model_provider_func
[rank2]: self.bridge.load_weights(model, args.model_dir)
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1425, in load_weights
[rank2]: list(self._convert([mg_model], state_dict, hf_prefix, True, 'Loading: '))
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1343, in _convert
[rank2]: res = self._set_layer_state(mg_layer, hf_state_dict, f'{self.hf_layers_prefix}.', layer_idx, to_mcore)
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1240, in _set_layer_state
[rank2]: hf_state_dict.update(self._set_layer_mlp(mg_layer, hf_state_dict, layer_idx, to_mcore))
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1224, in _set_layer_mlp
[rank2]: hf_state_dict.update(self._set_moe_state(mg_mlp, hf_state_dict, f'{hf_mlp_prefix}.', layer_idx, to_mcore))
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 690, in _set_moe_state
[rank2]: self._set_mlp_state(mg_experts, hf_state_dict, 'experts.', layer_idx, to_mcore, ep_rank=ep_rank))
[rank2]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 728, in _set_mlp_state
[rank2]: if isinstance(mg_mlp.linear_fc1, LoraParallelLinear):
[rank2]: ^^^^^^^^^^^^^^^^^
[rank2]: File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1962, in getattr
[rank2]: raise AttributeError(
[rank2]: AttributeError: 'SequentialMLP' object has no attribute 'linear_fc1'

How to Reproduce / 如何复现

环境:
ms_swift 3.12.4
peft 0.18.1
flash_attn 2.8.3+cu12torch28cxx11abitrue
transformers 4.57.6
transformer_engine_torch 2.10.0
megatron-core 0.15.3

运行脚本:
export MEGATRON_LM_PATH='/data/home/user/Megatron-LM-core_r0.15.0'

PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"
NPROC_PER_NODE=4
CUDA_VISIBLE_DEVICES=0,1,2,3
srun -n 1 megatron sft
--model '/data/home/user/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B-Instruct-2507'
--check_model false
--load_safetensors true
--save_safetensors true
--merge_lora false
--moe_grouped_gemm true
--dataset '/data/home/user/data.json'
--load_from_cache_file true
--no_gradient_accumulation_fusion true
--use_precision_aware_optimizer true
--optimizer_cpu_offload true
--optimizer_offload_fraction 0.7
--train_type lora
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--sequence_parallel true
--freeze_llm false
--freeze_vit true
--freeze_aligner true
--packing true
--split_dataset_ratio 0.01
--expert_model_parallel_size 4
--moe_permute_fusion true
--moe_grouped_gemm false
--moe_shared_expert_overlap true
--moe_aux_loss_coeff 1e-3
--micro_batch_size 1
--global_batch_size 4
--recompute_granularity full
--recompute_method uniform
--recompute_num_layers 1
--finetune true
--cross_entropy_loss_fusion true
--lr 1e-4
--lr_warmup_fraction 0.05
--min_lr 1e-5
--max_epochs 1
--save megatron_output/Qwen/Qwen3-30B-A3B-Instruct-2507
--eval_interval 200
--save_interval 200
--vit_gradient_checkpointing true
--max_length 3000
--num_workers 8
--dataset_num_proc 8
--no_save_optim true
--no_save_rng true
--sequence_parallel true
--attention_backend flash

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions