【UnitTestFix No.19】Fix test_fused_dot_product_attention_op_static.py (#76405)

youge325 · web-flow · commit 3edeb0037360 · 2025-11-14T17:06:57.000+08:00
- 移除了 `list(REMOVE_ITEM TEST_OPS test_fused_dot_product_attention_op_static)` - 在Windows上使用如下cmake命令 `cmake .. -GNinja -DWITH_GPU=ON -DWITH_UNITY_BUILD=ON -DCUDA_ARCH_NAME=Auto -DWITH_TENSORRT=ON -DTENSORRT_ROOT=D:\TensorRT-10.13.3.9 -DWITH_TESTING=ON` ，编译成功 测试用例报错如下： ```shell Start testing: Nov 05 21:48 中国标准时间 ---------------------------------------------------------- 1488/2270 Testing: test_fused_dot_product_attention_op_static 1488/2270 Test: test_fused_dot_product_attention_op_static Command: "D:/Program Files/CMake/bin/cmake.exe" "-E" "env" "PYTHONPATH=D:/Lenovo/Paddle/build/python" "D:/Users/Lenovo/AppData/Local/Programs/Python/Python310/python.exe" "D:/Lenovo/Paddle/tools/test_runner.py" "test_fused_dot_product_attention_op_static" Directory: D:/Lenovo/Paddle/build/test/legacy_test "test_fused_dot_product_attention_op_static" start time: Nov 05 21:48 中国标准时间 Output: ---------------------------------------------------------- test_fused_dot_product_attention_op_static failed E ====================================================================== ERROR: test_static_op (test_fused_dot_product_attention_op_static.TestFusedDotProductAttentionStatic) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\Lenovo\Paddle\build\test\legacy_test\test_fused_dot_product_attention_op_static.py", line 74, in test_static_op out0 = fused_dot_product_attention(q, k, v, mask) File "D:\Lenovo\Paddle\build\python\paddle\incubate\nn\functional\fused_dot_product_attention.py", line 202, in fused_dot_product_attention out, _, _ = _C_ops.fused_dot_product_attention( RuntimeError: (NotFound) The kernel `fused_dot_product_attention` is not registered. [Hint: Expected iter != kernels_.end(), but received iter == kernels_.end().] (at D:\Lenovo\Paddle\paddle\phi\core\kernel_factory.cc:276) ---------------------------------------------------------------------- Ran 1 test in 0.135s FAILED (errors=1) <end of output> Test time = 4.86 sec ---------------------------------------------------------- Test Failed. "test_fused_dot_product_attention_op_static" end time: Nov 05 21:48 中国标准时间 "test_fused_dot_product_attention_op_static" time elapsed: 00:00:04 ---------------------------------------------------------- End testing: Nov 05 21:48 中国标准时间 ``` - 添加 `-DWITH_CUDNN_FRONTEND=ON` 后，出现以下编译错误： ```shell [1675/3580] Linking CXX shared library paddle\phi\phi.dll FAILED: paddle/phi/phi.dll paddle/phi/phi.lib C:\WINDOWS\system32\cmd.exe /C "cd . && "D:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --msvc-ver=1944 --intdir=paddle\phi\CMakeFiles\phi.dir --rc="D:\Windows Kits\10\bin\10.0.26100.0\x64\rc.exe" --mt="D:\Windows Kits\10\bin\10.0.26100.0\x64\mt.exe" --manifests -- "D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\link.exe" /nologo @CMakeFiles\phi.rsp /out:paddle\phi\phi.dll /implib:paddle\phi\phi.lib /pdb:paddle\phi\phi.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4006 /ignore:4221 /NODEFAULTLIB:MSVCRT.LIB /INCREMENTAL:NO && cd ." LINK: command "D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.44.35207\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\phi.rsp /out:paddle\phi\phi.dll /implib:paddle\phi\phi.lib /pdb:paddle\phi\phi.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4006 /ignore:4221 /NODEFAULTLIB:MSVCRT.LIB /INCREMENTAL:NO /MANIFEST:EMBED,ID=2" failed (exit code 1120) with the following output: 正在创建库 paddle\phi\phi.lib 和对象 paddle\phi\phi.exp fused_dconv_drelu_dbn_kernel.cu.obj : error LNK2019: 无法解析的外部符号 "bool paddle_flags::FLAGS_cudnn_deterministic" (?FLAGS_cudnn_deterministic@paddle_flags@@3_NA)，函数 "void __cdecl phi::fusion::FusedDconvDreluDbnKernel<struct phi::dtype::float16,class phi::GPUContext>(class phi::GPUContext const &,class phi::DenseTensor const &,class phi::DenseTensor const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class phi::DenseTensor const &,class phi::DenseTensor const &,class phi::DenseTensor const &,class phi::DenseTensor const &,class phi::DenseTensor const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class paddle::optional<class phi::DenseTensor> const &,class std::vector<int,class std::allocator<int> > const &,class std::vector<int,class std::allocator<int> > const &,class std::vector<int,class std::allocator<int> > const &,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,bool,bool,bool,bool,class phi::DenseTensor *,class phi::DenseTensor *,class phi::DenseTensor *,class phi::DenseTensor *,class phi::DenseTensor *,class phi::DenseTensor *,class phi::DenseTensor *)" (??$FusedDconvDreluDbnKernel@Ufloat16@dtype@phi@@VGPUContext@3@@Fusion@phi@@YAXAEBVGPUContext@1@AEBVDenseTensor@1@1AEBV?$optional@VDenseTensor@phi@@@paddle@@22221111122222AEBV?$vector@HV?$allocator@H@std@@@std@@33AEBV?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@7@H4_N555PEAV31@666666@Z) 中引用了该符号 fused_dot_product_attention_op.cu.obj : error LNK2001: 无法解析的外部符号 "bool paddle_flags::FLAGS_cudnn_deterministic" (?FLAGS_cudnn_deterministic@paddle_flags@@3_NA) fused_scale_bias_add_relu_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "bool paddle_flags::FLAGS_cudnn_deterministic" (?FLAGS_cudnn_deterministic@paddle_flags@@3_NA) fused_scale_bias_relu_conv_bn_kernel.cu.obj : error LNK2001: 无法解析的外部符号 "bool paddle_flags::FLAGS_cudnn_deterministic" (?FLAGS_cudnn_deterministic@paddle_flags@@3_NA) paddle\phi\phi.dll : fatal error LNK1120: 1 个无法解析的外部命令 [1682/3580] Building CXX object paddle\fluid\framework\ir\...s.dir\conv2d_trans_filter_dilations_nxn_to_1x1_pass.cc.obj ninja: build stopped: subcommand failed. ``` 在以下四个GPU融合算子内核文件中: - fused_dconv_drelu_dbn_kernel.cu - fused_dot_product_attention_op.cu - fused_scale_bias_add_relu_kernel.cu - fused_scale_bias_relu_conv_bn_kernel.cu 将 `PHI_DECLARE_bool(cudnn_deterministic)` 修改为 `COMMON_DECLARE_bool(cudnn_deterministic)`, 应当是上次进行大规模重构过程中遗漏的修改。 编译成功后运行测试用例，报错如下： ```shell Start testing: Nov 13 18:02 中国标准时间 ---------------------------------------------------------- 1491/2280 Testing: test_fused_dot_product_attention_op_static 1491/2280 Test: test_fused_dot_product_attention_op_static Command: "D:/Program Files/CMake/bin/cmake.exe" "-E" "env" "PYTHONPATH=D:/Lenovo/Paddle/build/python" "D:/Users/Lenovo/AppData/Local/Programs/Python/Python310/python.exe" "D:/Lenovo/Paddle/tools/test_runner.py" "test_fused_dot_product_attention_op_static" Directory: D:/Lenovo/Paddle/build/test/legacy_test "test_fused_dot_product_attention_op_static" start time: Nov 13 18:02 中国标准时间 Output: ---------------------------------------------------------- WARNING: Logging before InitGoogleLogging() is written to STDERR W1113 18:02:59.023836 18612 gpu_resources.cc:116] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 13.0, Runtime API Version: 13.0 test_fused_dot_product_attention_op_static failed E ====================================================================== ERROR: test_static_op (test_fused_dot_product_attention_op_static.TestFusedDotProductAttentionStatic) ---------------------------------------------------------------------- Traceback (most recent call last): File "D:\Lenovo\Paddle\build\test\legacy_test\test_fused_dot_product_attention_op_static.py", line 106, in test_static_op outs = fused_dot_product_attention(q, k, v, mask) File "D:\Lenovo\Paddle\build\python\paddle\incubate\nn\functional\fused_dot_product_attention.py", line 202, in fused_dot_product_attention out, _, _ = _C_ops.fused_dot_product_attention( ValueError: (InvalidArgument) fused_dot_product_attention(): argument 'q' (position 0) must be Tensor, but got paddle.base.libpaddle.pir.Value (at D:\Lenovo\Paddle\paddle\fluid\pybind\eager_utils.cc:1531) ---------------------------------------------------------------------- Ran 1 test in 0.787s FAILED (errors=1) <end of output> Test time = 5.86 sec ---------------------------------------------------------- Test Failed. "test_fused_dot_product_attention_op_static" end time: Nov 13 18:03 中国标准时间 "test_fused_dot_product_attention_op_static" time elapsed: 00:00:05 ---------------------------------------------------------- End testing: Nov 13 18:03 中国标准时间 ``` - 添加了 `OldIrGuard` 上下文管理器来禁用PIR(Program Intermediate Representation)模式 - 修改了静态图测试的结构,使用独立的 `main_program` 和 `startup_program` - 明确调用 `exe.run(startup_program)` 进行初始化 **作用:** - **兼容新旧IR系统**: 确保测试在传统静态图模式下运行,而不是新的PIR模式 - **修复测试问题**: 提高测试的稳定性和正确性
diff --git a/paddle/phi/kernels/fusion/gpu/fused_dconv_drelu_dbn_kernel.cu b/paddle/phi/kernels/fusion/gpu/fused_dconv_drelu_dbn_kernel.cu
@@ -25,7 +25,7 @@ limitations under the License. */
 #include "paddle/phi/kernels/funcs/batch_norm_utils.h"
 #include "paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h"
 
-PHI_DECLARE_bool(cudnn_deterministic);
+COMMON_DECLARE_bool(cudnn_deterministic);
 COMMON_DECLARE_bool(cudnn_exhaustive_search);
 
 namespace phi {
diff --git a/paddle/phi/kernels/fusion/gpu/fused_dot_product_attention_op.cu b/paddle/phi/kernels/fusion/gpu/fused_dot_product_attention_op.cu
@@ -23,7 +23,7 @@
 #include "paddle/phi/kernels/expand_kernel.h"
 #include "paddle/phi/kernels/gpudnn/mha_cudnn_frontend.h"
 
-PHI_DECLARE_bool(cudnn_deterministic);
+COMMON_DECLARE_bool(cudnn_deterministic);
 
 namespace phi {
 namespace fusion {
diff --git a/paddle/phi/kernels/fusion/gpu/fused_scale_bias_add_relu_kernel.cu b/paddle/phi/kernels/fusion/gpu/fused_scale_bias_add_relu_kernel.cu
@@ -25,7 +25,7 @@ limitations under the License. */
 #include "paddle/phi/kernels/funcs/batch_norm_utils.h"
 #include "paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h"
 
-PHI_DECLARE_bool(cudnn_deterministic);
+COMMON_DECLARE_bool(cudnn_deterministic);
 COMMON_DECLARE_bool(cudnn_exhaustive_search);
 
 namespace phi {
diff --git a/paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bn_kernel.cu b/paddle/phi/kernels/fusion/gpu/fused_scale_bias_relu_conv_bn_kernel.cu
@@ -25,7 +25,7 @@ limitations under the License. */
 #include "paddle/phi/kernels/funcs/batch_norm_utils.h"
 #include "paddle/phi/kernels/gpudnn/conv_cudnn_frontend.h"
 
-PHI_DECLARE_bool(cudnn_deterministic);
+COMMON_DECLARE_bool(cudnn_deterministic);
 COMMON_DECLARE_bool(cudnn_exhaustive_search);
 
 namespace phi {
diff --git a/test/legacy_test/CMakeLists.txt b/test/legacy_test/CMakeLists.txt
@@ -127,7 +127,6 @@ list(REMOVE_ITEM TEST_OPS test_audio_logmel_feature test_audio_mel_feature)
 list(REMOVE_ITEM TEST_OPS test_fused_gemm_epilogue_op)
 list(REMOVE_ITEM TEST_OPS test_fused_gemm_epilogue_grad_op)
 list(REMOVE_ITEM TEST_OPS test_fused_dot_product_attention_op)
-list(REMOVE_ITEM TEST_OPS test_fused_dot_product_attention_op_static)
 list(REMOVE_ITEM TEST_OPS test_fuse_dot_product_attention_pass)
 
 if(((NOT WITH_ROCM) AND (NOT WITH_GPU)) OR WIN32)
diff --git a/test/legacy_test/test_fused_dot_product_attention_op_static.py b/test/legacy_test/test_fused_dot_product_attention_op_static.py
@@ -89,24 +89,32 @@ def test_static_op(self):
         paddle.seed(312)
 
         # call fused_dot_product_attention in static mode
-        with paddle.static.program_guard(paddle.static.Program()):
-            q = paddle.static.data(
-                name="q", shape=self.q_shape, dtype=self.dtype
-            )
-            k = paddle.static.data(
-                name="k", shape=self.kv_shape, dtype=self.dtype
-            )
-            v = paddle.static.data(
-                name="v", shape=self.kv_shape, dtype=self.dtype
-            )
-            mask = paddle.static.data(
-                name="mask", shape=self.mask_shape, dtype=self.dtype
-            )
-
-            outs = fused_dot_product_attention(q, k, v, mask)
+        # Use OldIrGuard to disable PIR mode for legacy static graph test
+        from paddle.pir_utils import OldIrGuard
+
+        with OldIrGuard():
+            main_program = paddle.static.Program()
+            startup_program = paddle.static.Program()
+            with paddle.static.program_guard(main_program, startup_program):
+                q = paddle.static.data(
+                    name="q", shape=self.q_shape, dtype=self.dtype
+                )
+                k = paddle.static.data(
+                    name="k", shape=self.kv_shape, dtype=self.dtype
+                )
+                v = paddle.static.data(
+                    name="v", shape=self.kv_shape, dtype=self.dtype
+                )
+                mask = paddle.static.data(
+                    name="mask", shape=self.mask_shape, dtype=self.dtype
+                )
+
+                outs = fused_dot_product_attention(q, k, v, mask)
 
             exe = paddle.static.Executor(self.place)
+            exe.run(startup_program)
             out_s = exe.run(
+                main_program,
                 feed={
                     "q": q_data.astype('float16'),
                     "k": k_data.astype('float16'),
@@ -117,37 +125,41 @@ def test_static_op(self):
             )
             np.testing.assert_allclose(out_s[0], out0)
 
-        # call cudnn_flash_attention in static mode
-        with paddle.static.program_guard(paddle.static.Program()):
-            q = paddle.static.data(
-                name="q", shape=self.q_shape, dtype=self.dtype
-            )
-            k = paddle.static.data(
-                name="k", shape=self.kv_shape, dtype=self.dtype
-            )
-            v = paddle.static.data(
-                name="v", shape=self.kv_shape, dtype=self.dtype
-            )
-            mask = paddle.static.data(
-                name="mask", shape=self.mask_shape, dtype=self.dtype
-            )
-
-            outs = cudnn_flash_attention(
-                q,
-                k,
-                v,
-                mask,
-                None,
-                None,
-                1.0,
-                0.0,
-                True,
-                None,
-                "post_scale_bias",
-            )
-
-            exe = paddle.static.Executor(self.place)
-            out_s = exe.run(
+            # call cudnn_flash_attention in static mode
+            main_program2 = paddle.static.Program()
+            startup_program2 = paddle.static.Program()
+            with paddle.static.program_guard(main_program2, startup_program2):
+                q = paddle.static.data(
+                    name="q", shape=self.q_shape, dtype=self.dtype
+                )
+                k = paddle.static.data(
+                    name="k", shape=self.kv_shape, dtype=self.dtype
+                )
+                v = paddle.static.data(
+                    name="v", shape=self.kv_shape, dtype=self.dtype
+                )
+                mask = paddle.static.data(
+                    name="mask", shape=self.mask_shape, dtype=self.dtype
+                )
+
+                outs = cudnn_flash_attention(
+                    q,
+                    k,
+                    v,
+                    mask,
+                    None,
+                    None,
+                    1.0,
+                    0.0,
+                    True,
+                    None,
+                    "post_scale_bias",
+                )
+
+            exe2 = paddle.static.Executor(self.place)
+            exe2.run(startup_program2)
+            out_s = exe2.run(
+                main_program2,
                 feed={
                     "q": q_data.astype('float16'),
                     "k": k_data.astype('float16'),