[Accuracy diff No.90] Fix accuracy diff for paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm API #74149

hushenwei2000 · 2025-07-21T10:39:38Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

The existing fused_bias_dropout_residual_layer_norm operator produces all-zero gradients in backward pass under certain parameter configurations.

cols_ == 1024	bias == nullptr	dropout_rate == 0	Result
Yes	Yes	Yes	FAIL
Yes	Yes	No	PASS
Yes	No	Yes	PASS
Yes	No	No	PASS
No	Yes	Yes	PASS
No	Yes	No	PASS
No	No	Yes	PASS
No	No	No	PASS

The source code investigation revealed that the failing cases invoked the fast kernel ln_bwd_fast_kernel_driver:

// FILE: paddle/phi/kernels/fusion/gpu/fused_dropout_helper.h
    if (this->cols_ == 1024 && d_bias == nullptr && d_scale != nullptr &&
        d_layernorm_bias != nullptr && sizeof(T) <= 4) {
      can_call_1024_kernel = true;
    }

Within the fast kernel, there exists this code:

// FILE: paddle/phi/kernels/funcs/layer_norm_impl.cu.h
            dout[it][jt] = x[it][jt] * static_cast<T>(mask_vec[it][jt]) * factor;

, when dropout_rate == 0, the mask_vec becomes all zeros, causing the computed dout to become all zeros.

The fix references the non-fast-kernel implementation which properly handles the dropout scenario by conditionally applying the mask multiplication (i.e., skips multiplying mask_vec when dropout is disabled):

// FILE: paddle/phi/kernels/fusion/gpu/fused_residual_dropout_bias.h
          if (HasDropout) {
            dx_vec[i] = out_vec[i] * static_cast<T>(mask_vec[i]) * factor;
          } else {
            dx_vec[i] = out_vec[i] * factor;
          }

This ensures all 8 test cases pass after the fix. Additionally, all failures in PaddleAPITest are now resolved as shown below.

2025-07-21 16:10:47.750734 Worker PID: 91054, Assigned GPU ID: 2
2025-07-21 16:10:47.853978 Worker PID: 91055, Assigned GPU ID: 3
2025-07-21 16:10:47.944813 Worker PID: 91056, Assigned GPU ID: 0
2025-07-21 16:10:47.853036 Worker PID: 91057, Assigned GPU ID: 7
2025-07-21 16:10:47.857670 Worker PID: 91058, Assigned GPU ID: 5
2025-07-21 16:10:47.624630 Worker PID: 91059, Assigned GPU ID: 1
2025-07-21 16:10:47.626300 GPU 1 91059 test begin: paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, )
W0721 16:11:21.658066 91059 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8
[Pass] paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, )
2025-07-21 16:10:47.755052 Worker PID: 91060, Assigned GPU ID: 6
2025-07-21 16:10:47.723834 Worker PID: 91061, Assigned GPU ID: 4
W0721 16:11:23.500622 91054 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8

Pcard-67164

…反向全为 0 的bug

paddle-bot · 2025-07-21T10:39:44Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2025-07-21T10:39:47Z

All committers have signed the CLA.

hushenwei2000 · 2025-07-23T06:32:18Z

/re-run all-failed

hushenwei2000 · 2025-07-30T06:36:57Z

/re-run all-failed

fix(layer_norm_impl.cu.h/fused_ln_bwd_fast_kernel): 修复当 dropout 为 0 时…

b4c2957

…反向全为 0 的bug

fix(layer_norm_impl.cu.h/fused_ln_bwd_fast_kernel): code format

399efab

luotao1 mentioned this pull request Jul 22, 2025

【开源任务】Paddle CPU/GPU Kernel 精度问题推全 #72667

Open

lshpku approved these changes Jul 31, 2025

View reviewed changes

wanghuancoder approved these changes Jul 31, 2025

View reviewed changes

wanghuancoder merged commit 28be650 into PaddlePaddle:develop Jul 31, 2025
92 of 94 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Accuracy diff No.90] Fix accuracy diff for paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm API #74149

[Accuracy diff No.90] Fix accuracy diff for paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm API #74149

Uh oh!

hushenwei2000 commented Jul 21, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Jul 21, 2025

Uh oh!

CLAassistant commented Jul 21, 2025 •

edited

Loading

Uh oh!

hushenwei2000 commented Jul 23, 2025

Uh oh!

hushenwei2000 commented Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Accuracy diff No.90] Fix accuracy diff for paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm API #74149

[Accuracy diff No.90] Fix accuracy diff for paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm API #74149

Uh oh!

Conversation

hushenwei2000 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Jul 21, 2025

Uh oh!

CLAassistant commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hushenwei2000 commented Jul 23, 2025

Uh oh!

hushenwei2000 commented Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hushenwei2000 commented Jul 21, 2025 •

edited

Loading

CLAassistant commented Jul 21, 2025 •

edited

Loading