-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Checklist / 检查清单
- I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。
Bug Description / Bug 描述
[rank31]: Traceback (most recent call last):
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/cli/sft.py", line 20, in
[rank31]: sft_main()
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/pipelines/train/sft.py", line 346, in sft_main
[rank31]: return SwiftSft(args).main()
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/pipelines/train/sft.py", line 30, in init
[rank31]: self._prepare_model_tokenizer()
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/ray/base.py", line 169, in wrapper
[rank31]: return func(self, *args, **kwargs)
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/pipelines/train/sft.py", line 52, in _prepare_model_tokenizer
[rank31]: self.model, self.processor = args.get_model_processor(**kwargs)
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/arguments/base_args/base_args.py", line 328, in get_model_processor
[rank31]: return get_model_processor(**res)
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/model/register.py", line 607, in get_model_processor
[rank31]: return loader.load()
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/model/register.py", line 456, in load
[rank31]: model, processor = self._get_model_processor(model_dir, config)
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/model/register.py", line 447, in _get_model_processor
[rank31]: model = self.get_model(model_dir, config, processor, self.model_kwargs.copy())
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/model/models/baidu.py", line 98, in get_model
[rank31]: return super().get_model(model_dir, *args, **kwargs)
[rank31]: File "/home/kas/pretraining/ms-swift-3.12.4/swift/model/register.py", line 306, in get_model
[rank31]: model = auto_model_cls.from_pretrained(model_dir, config=config, trust_remote_code=True, **model_kwargs)
[rank31]: File "/usr/local/python3.10/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 607, in from_pretrained
[rank31]: raise ValueError(
[rank31]: ValueError: Unrecognized configuration class <class 'transformers_modules.PaddleOCR_hyphen_VL_hyphen_1_5.configuration_paddleocr_vl.PaddleOCRVLConfig'> for this kind of AutoModel: AutoModelForImageTextToText.
[rank31]: Model type should be one of AriaConfig, AyaVisionConfig, BlipConfig, Blip2Config, ChameleonConfig, Cohere2VisionConfig, DeepseekVLConfig, DeepseekVLHybridConfig, Emu3Config, EvollaConfig, Florence2Config, FuyuConfig, Gemma3Config, Gemma3nConfig, GitConfig, Glm4vConfig, Glm4vMoeConfig, GotOcr2Config, IdeficsConfig, Idefics2Config, Idefics3Config, InstructBlipConfig, InternVLConfig, JanusConfig, Kosmos2Config, Kosmos2_5Config, Lfm2VlConfig, Llama4Config, LlavaConfig, LlavaNextConfig, LlavaNextVideoConfig, LlavaOnevisionConfig, Mistral3Config, MllamaConfig, Ovis2Config, PaliGemmaConfig, PerceptionLMConfig, Pix2StructConfig, PixtralVisionConfig, Qwen2_5_VLConfig, Qwen2VLConfig, Qwen3VLConfig, Qwen3VLMoeConfig, ShieldGemma2Config, SmolVLMConfig, UdopConfig, VipLlavaConfig, VisionEncoderDecoderConfig.
How to Reproduce / 如何复现
运行环境:NPU
脚本:
MIN_PIXELS=$MIN_PIXELS
MAX_PIXELS=$MAX_PIXELS
torchrun
--nproc_per_node $NPUS_PER_NODE
--nnodes $NNODES
--node_rank $NODE_RANK
--master_addr $MASTER_ADDR
--master_port $MASTER_PORT
swift/cli/sft.py
--model_type paddle_ocr_1_5
--model $BASE_MODEL_DIR
--dataset $DATA_PATH
$resume_param
--train_type full
--torch_dtype float16
--num_train_epochs 6
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 2e-5
--gradient_accumulation_steps 4
--save_steps 1000
--eval_strategy no
--save_total_limit 2
--logging_steps 10
--max_length 5600
--output_dir /home/kas/pretraining/saved_models/layout/${MODEL_ID}
--warmup_ratio 0.05
--lazy_tokenize true
--ddp_backend hccl
--freeze_aligner false
--lr_scheduler_type=cosine
--freeze_vit false
--deepspeed zero2
--dataloader_num_workers 8 2>&1 | tee -a ${LOG_DIR}/rank${NODE_RANK}_train.log
Additional Information / 补充信息
No response