[INFO:swift] Global seed set to 42
[INFO:swift] attn_impl: flash_attn
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.
[INFO:swift] Setting image_max_token_num: 16384. You can adjust this hyperparameter through the environment variable: `IMAGE_MAX_TOKEN_NUM`.
[INFO:swift] Setting image_min_token_num: 4. You can adjust this hyperparameter through the environment variable: `IMAGE_MIN_TOKEN_NUM`.
[INFO:swift] Setting spatial_merge_size: 2. You can adjust this hyperparameter through the environment variable: `SPATIAL_MERGE_SIZE`.
[INFO:swift] Setting video_max_token_num: 768. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_TOKEN_NUM`.
[INFO:swift] Setting video_min_token_num: 128. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_TOKEN_NUM`.
[INFO:swift] model_kwargs: {'device_map': 'cuda:0', 'dtype': torch.bfloat16}
Traceback (most recent call last):
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\cli\deploy.py", line 5, in <module>
deploy_main()
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 239, in deploy_main
SwiftDeploy(args).main()
^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 53, in __init__
super().__init__(args)
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\infer.py", line 34, in __init__
model, self.template = prepare_model_template(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\utils.py", line 39, in prepare_model_template
model, processor = args.get_model_processor(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\arguments\base_args\base_args.py", line 327, in get_model_processor
return get_model_processor(**res)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 625, in get_model_processor
return loader.load()
^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 474, in load
model, processor = self._get_model_processor(model_dir, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 465, in _get_model_processor
model = self.get_model(model_dir, config, processor, self.model_kwargs.copy())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 1175, in get_model
return Qwen2VLLoader.get_model(self, model_dir, config, processor, model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 736, in get_model
model = super().get_model(model_dir, config, processor, model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 315, in get_model
model = auto_model_cls.from_pretrained(model_dir, config=config, trust_remote_code=True, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\patcher.py", line 388, in _new_from_pretrained
model = from_pretrained(cls, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 4166, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\models\qwen3_5\modeling_qwen3_5.py", line 1810, in __init__
super().__init__(config)
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1299, in __init__
self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1915, in _check_and_adjust_attn_implementation
lazy_import_flash_attention(applicable_attn_implementation)
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 248, in lazy_import_flash_attention
_flash_fn, _flash_varlen_fn, _flash_with_kvcache_fn, _pad_fn, _unpad_fn = _lazy_imports(
^^^^^^^^^^^^^^
File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 156, in _lazy_imports
from flash_attn import flash_attn_func, flash_attn_varlen_func, flash_attn_with_kvcache
File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\flash_attn_interface.py", line 15, in <module>
import flash_attn_2_cuda as flash_attn_gpu
ImportError: DLL load failed while importing flash_attn_2_cuda:
Checklist / 检查清单
Bug Description / Bug 描述
Windows下使用flash attention会报错:
此问题原因已查明。
Windows动态链接问题,需要先import torch再import flash_attn_2_cuda,否则会提示找不到dll文件,现象如下:
先导入torch就不会有问题了:
解决方法也很简单,在flash_attn_interface.py文件import flash_attn_2_cuda前加上import torch:
How to Reproduce / 如何复现
在windows下使用flash attention必现。
Additional Information / 补充信息
No response