Qwen3-30B-A3B-Instruct-2507微调报错

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/cli/_megatron/sft.py", line 7, in <module>
[rank2]:     megatron_sft_main()
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 87, in megatron_sft_main
[rank2]:     return MegatronSft(args).main()
[rank2]:            ^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/llm/base.py", line 49, in main
[rank2]:     result = self.run()
[rank2]:              ^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/train/sft.py", line 77, in run
[rank2]:     self.trainer.train(train_dataset, val_dataset, data_collator)
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 1126, in train
[rank2]:     pretrain(
[rank2]:   File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 666, in pretrain
[rank2]:     model, optimizer, opt_param_scheduler = setup_model_and_optimizer(
[rank2]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 504, in setup_model_and_optimizer
[rank2]:     model, optimizer, opt_param_scheduler = self._origin_setup_model_and_optimizer(
[rank2]:                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 1094, in setup_model_and_optimizer
[rank2]:     model = get_model(model_provider_func, model_type)
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 885, in get_model
[rank2]:     model = build_model()
[rank2]:             ^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/moe-llm/Megatron-LM-core_r0.15.0/megatron/training/training.py", line 877, in build_model
[rank2]:     model = model_provider_func(pre_process=pre_process, post_process=post_process)
[rank2]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/trainers/base.py", line 479, in new_model_provider_func
[rank2]:     self.bridge.load_weights(model, args.model_dir)
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1425, in load_weights
[rank2]:     list(self._convert([mg_model], state_dict, hf_prefix, True, 'Loading: '))
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1343, in _convert
[rank2]:     res = self._set_layer_state(mg_layer, hf_state_dict, f'{self.hf_layers_prefix}.', layer_idx, to_mcore)
[rank2]:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1240, in _set_layer_state
[rank2]:     hf_state_dict.update(self._set_layer_mlp(mg_layer, hf_state_dict, layer_idx, to_mcore))
[rank2]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 1224, in _set_layer_mlp
[rank2]:     hf_state_dict.update(self._set_moe_state(mg_mlp, hf_state_dict, f'{hf_mlp_prefix}.', layer_idx, to_mcore))
[rank2]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 690, in _set_moe_state
[rank2]:     self._set_mlp_state(mg_experts, hf_state_dict, 'experts.', layer_idx, to_mcore, ep_rank=ep_rank))
[rank2]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/swift/megatron/model/gpt_bridge.py", line 728, in _set_mlp_state
[rank2]:     if isinstance(mg_mlp.linear_fc1, LoraParallelLinear):
[rank2]:                   ^^^^^^^^^^^^^^^^^
[rank2]:   File "/data/home/user/miniconda3/envs/py311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
[rank2]:     raise AttributeError(
[rank2]: AttributeError: 'SequentialMLP' object has no attribute 'linear_fc1'

### How to Reproduce / 如何复现

环境：
ms_swift                      3.12.4
peft                          0.18.1
flash_attn                    2.8.3+cu12torch28cxx11abitrue
transformers                  4.57.6
transformer_engine_torch      2.10.0
megatron-core                 0.15.3

运行脚本：
export MEGATRON_LM_PATH='/data/home/user/Megatron-LM-core_r0.15.0'

PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True" \
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
srun -n 1 megatron sft \
    --model '/data/home/user/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B-Instruct-2507' \
    --check_model false \
    --load_safetensors true \
    --save_safetensors true \
    --merge_lora false \
    --moe_grouped_gemm true \
    --dataset '/data/home/user/data.json' \
    --load_from_cache_file true \
    --no_gradient_accumulation_fusion true \
    --use_precision_aware_optimizer true \
    --optimizer_cpu_offload true \
    --optimizer_offload_fraction 0.7 \
    --train_type lora \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --sequence_parallel true \
    --freeze_llm false \
    --freeze_vit true \
    --freeze_aligner true \
    --packing true \
    --split_dataset_ratio 0.01 \
    --expert_model_parallel_size 4 \
    --moe_permute_fusion true \
    --moe_grouped_gemm false \
    --moe_shared_expert_overlap true \
    --moe_aux_loss_coeff 1e-3 \
    --micro_batch_size 1 \
    --global_batch_size 4 \
    --recompute_granularity full \
    --recompute_method uniform \
    --recompute_num_layers 1 \
    --finetune true \
    --cross_entropy_loss_fusion true \
    --lr 1e-4 \
    --lr_warmup_fraction 0.05 \
    --min_lr 1e-5 \
    --max_epochs 1 \
    --save megatron_output/Qwen/Qwen3-30B-A3B-Instruct-2507 \
    --eval_interval 200 \
    --save_interval 200 \
    --vit_gradient_checkpointing true \
    --max_length 3000 \
    --num_workers 8 \
    --dataset_num_proc 8 \
    --no_save_optim true \
    --no_save_rng true \
    --sequence_parallel true \
    --attention_backend flash

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3-30B-A3B-Instruct-2507微调报错 #8004

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3-30B-A3B-Instruct-2507微调报错 #8004

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions