Skip to content

DLL load failed while importing flash_attn_2_cuda #9365

@CharlinChen

Description

@CharlinChen

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

Windows下使用flash attention会报错:

[INFO:swift] Global seed set to 42
[INFO:swift] attn_impl: flash_attn
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.
[INFO:swift] Setting image_max_token_num: 16384. You can adjust this hyperparameter through the environment variable: `IMAGE_MAX_TOKEN_NUM`.
[INFO:swift] Setting image_min_token_num: 4. You can adjust this hyperparameter through the environment variable: `IMAGE_MIN_TOKEN_NUM`.
[INFO:swift] Setting spatial_merge_size: 2. You can adjust this hyperparameter through the environment variable: `SPATIAL_MERGE_SIZE`.
[INFO:swift] Setting video_max_token_num: 768. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_TOKEN_NUM`.
[INFO:swift] Setting video_min_token_num: 128. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_TOKEN_NUM`.
[INFO:swift] model_kwargs: {'device_map': 'cuda:0', 'dtype': torch.bfloat16}
Traceback (most recent call last):
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\cli\deploy.py", line 5, in <module>
    deploy_main()
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 239, in deploy_main
    SwiftDeploy(args).main()
    ^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 53, in __init__
    super().__init__(args)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\infer.py", line 34, in __init__
    model, self.template = prepare_model_template(args)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\utils.py", line 39, in prepare_model_template
    model, processor = args.get_model_processor(**kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\arguments\base_args\base_args.py", line 327, in get_model_processor
    return get_model_processor(**res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 625, in get_model_processor
    return loader.load()
           ^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 474, in load
    model, processor = self._get_model_processor(model_dir, config)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 465, in _get_model_processor
    model = self.get_model(model_dir, config, processor, self.model_kwargs.copy())
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 1175, in get_model
    return Qwen2VLLoader.get_model(self, model_dir, config, processor, model_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 736, in get_model
    model = super().get_model(model_dir, config, processor, model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 315, in get_model
    model = auto_model_cls.from_pretrained(model_dir, config=config, trust_remote_code=True, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\patcher.py", line 388, in _new_from_pretrained
    model = from_pretrained(cls, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 4166, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\models\qwen3_5\modeling_qwen3_5.py", line 1810, in __init__
    super().__init__(config)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1299, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1915, in _check_and_adjust_attn_implementation
    lazy_import_flash_attention(applicable_attn_implementation)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 248, in lazy_import_flash_attention
    _flash_fn, _flash_varlen_fn, _flash_with_kvcache_fn, _pad_fn, _unpad_fn = _lazy_imports(
                                                                              ^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 156, in _lazy_imports
    from flash_attn import flash_attn_func, flash_attn_varlen_func, flash_attn_with_kvcache
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: DLL load failed while importing flash_attn_2_cuda: 

此问题原因已查明。
Windows动态链接问题,需要先import torch再import flash_attn_2_cuda,否则会提示找不到dll文件,现象如下:

Image

先导入torch就不会有问题了:

Image

解决方法也很简单,在flash_attn_interface.py文件import flash_attn_2_cuda前加上import torch:

Image如上图第15行。提这个issue希望能帮到遇到同样问题的人。

How to Reproduce / 如何复现

在windows下使用flash attention必现。

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions