DLL load failed while importing flash_attn_2_cuda

### Checklist / 检查清单

- [x] I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues，确认这是一个新的 bug report。

### Bug Description / Bug 描述

Windows下使用flash attention会报错：

```
[INFO:swift] Global seed set to 42
[INFO:swift] attn_impl: flash_attn
[INFO:swift] Setting max_ratio: 200. You can adjust this hyperparameter through the environment variable: `MAX_RATIO`.
[INFO:swift] Setting frame_factor: 2. You can adjust this hyperparameter through the environment variable: `FRAME_FACTOR`.
[INFO:swift] Setting fps: 2.0. You can adjust this hyperparameter through the environment variable: `FPS`.
[INFO:swift] Setting fps_min_frames: 4. You can adjust this hyperparameter through the environment variable: `FPS_MIN_FRAMES`.
[INFO:swift] Setting fps_max_frames: 768. You can adjust this hyperparameter through the environment variable: `FPS_MAX_FRAMES`.
[INFO:swift] Setting image_max_token_num: 16384. You can adjust this hyperparameter through the environment variable: `IMAGE_MAX_TOKEN_NUM`.
[INFO:swift] Setting image_min_token_num: 4. You can adjust this hyperparameter through the environment variable: `IMAGE_MIN_TOKEN_NUM`.
[INFO:swift] Setting spatial_merge_size: 2. You can adjust this hyperparameter through the environment variable: `SPATIAL_MERGE_SIZE`.
[INFO:swift] Setting video_max_token_num: 768. You can adjust this hyperparameter through the environment variable: `VIDEO_MAX_TOKEN_NUM`.
[INFO:swift] Setting video_min_token_num: 128. You can adjust this hyperparameter through the environment variable: `VIDEO_MIN_TOKEN_NUM`.
[INFO:swift] model_kwargs: {'device_map': 'cuda:0', 'dtype': torch.bfloat16}
Traceback (most recent call last):
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\cli\deploy.py", line 5, in <module>
    deploy_main()
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 239, in deploy_main
    SwiftDeploy(args).main()
    ^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\deploy.py", line 53, in __init__
    super().__init__(args)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\infer\infer.py", line 34, in __init__
    model, self.template = prepare_model_template(args)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\pipelines\utils.py", line 39, in prepare_model_template
    model, processor = args.get_model_processor(**kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\arguments\base_args\base_args.py", line 327, in get_model_processor
    return get_model_processor(**res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 625, in get_model_processor
    return loader.load()
           ^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 474, in load
    model, processor = self._get_model_processor(model_dir, config)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 465, in _get_model_processor
    model = self.get_model(model_dir, config, processor, self.model_kwargs.copy())
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 1175, in get_model
    return Qwen2VLLoader.get_model(self, model_dir, config, processor, model_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\models\qwen.py", line 736, in get_model
    model = super().get_model(model_dir, config, processor, model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\register.py", line 315, in get_model
    model = auto_model_cls.from_pretrained(model_dir, config=config, trust_remote_code=True, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\swift\model\patcher.py", line 388, in _new_from_pretrained
    model = from_pretrained(cls, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 4166, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\models\qwen3_5\modeling_qwen3_5.py", line 1810, in __init__
    super().__init__(config)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1299, in __init__
    self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_utils.py", line 1915, in _check_and_adjust_attn_implementation
    lazy_import_flash_attention(applicable_attn_implementation)
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 248, in lazy_import_flash_attention
    _flash_fn, _flash_varlen_fn, _flash_with_kvcache_fn, _pad_fn, _unpad_fn = _lazy_imports(
                                                                              ^^^^^^^^^^^^^^
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\transformers\modeling_flash_attention_utils.py", line 156, in _lazy_imports
    from flash_attn import flash_attn_func, flash_attn_varlen_func, flash_attn_with_kvcache
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "F:\LLM\ms-swift\.venv\Lib\site-packages\flash_attn\flash_attn_interface.py", line 15, in <module>
    import flash_attn_2_cuda as flash_attn_gpu
ImportError: DLL load failed while importing flash_attn_2_cuda: 
```
此问题原因已查明。
Windows动态链接问题，需要先import torch再import flash_attn_2_cuda，否则会提示找不到dll文件，现象如下：

<img width="1198" height="183" alt="Image" src="https://github.com/user-attachments/assets/69ccaf3f-e12b-471d-9e32-8c1a16a84d00" />


先导入torch就不会有问题了：


<img width="1195" height="163" alt="Image" src="https://github.com/user-attachments/assets/7d775397-759a-4787-b2fa-f32bd95fbdb5" />

解决方法也很简单，在flash_attn_interface.py文件import flash_attn_2_cuda前加上import torch：

<img width="1988" height="1030" alt="Image" src="https://github.com/user-attachments/assets/3bd7b94f-4302-4486-bc96-a07c9cef84c4" />如上图第15行。提这个issue希望能帮到遇到同样问题的人。

### How to Reproduce / 如何复现

在windows下使用flash attention必现。

### Additional Information / 补充信息

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DLL load failed while importing flash_attn_2_cuda #9365

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DLL load failed while importing flash_attn_2_cuda #9365

Description

Checklist / 检查清单

Bug Description / Bug 描述

How to Reproduce / 如何复现

Additional Information / 补充信息

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions