Skip to content

Conversation

@hushenwei2000
Copy link
Contributor

@hushenwei2000 hushenwei2000 commented Jul 21, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

The existing fused_bias_dropout_residual_layer_norm operator produces all-zero gradients in backward pass under certain parameter configurations.

cols_ == 1024 bias == nullptr dropout_rate == 0 Result
Yes Yes Yes FAIL
Yes Yes No PASS
Yes No Yes PASS
Yes No No PASS
No Yes Yes PASS
No Yes No PASS
No No Yes PASS
No No No PASS

The source code investigation revealed that the failing cases invoked the fast kernel ln_bwd_fast_kernel_driver:

// FILE: paddle/phi/kernels/fusion/gpu/fused_dropout_helper.h
    if (this->cols_ == 1024 && d_bias == nullptr && d_scale != nullptr &&
        d_layernorm_bias != nullptr && sizeof(T) <= 4) {
      can_call_1024_kernel = true;
    }

Within the fast kernel, there exists this code:

// FILE: paddle/phi/kernels/funcs/layer_norm_impl.cu.h
            dout[it][jt] = x[it][jt] * static_cast<T>(mask_vec[it][jt]) * factor;

, when dropout_rate == 0, the mask_vec becomes all zeros, causing the computed dout to become all zeros.

The fix references the non-fast-kernel implementation which properly handles the dropout scenario by conditionally applying the mask multiplication (i.e., skips multiplying mask_vec when dropout is disabled):

// FILE: paddle/phi/kernels/fusion/gpu/fused_residual_dropout_bias.h
          if (HasDropout) {
            dx_vec[i] = out_vec[i] * static_cast<T>(mask_vec[i]) * factor;
          } else {
            dx_vec[i] = out_vec[i] * factor;
          }

This ensures all 8 test cases pass after the fix. Additionally, all failures in PaddleAPITest are now resolved as shown below.

2025-07-21 16:10:47.750734 Worker PID: 91054, Assigned GPU ID: 2
2025-07-21 16:10:47.853978 Worker PID: 91055, Assigned GPU ID: 3
2025-07-21 16:10:47.944813 Worker PID: 91056, Assigned GPU ID: 0
2025-07-21 16:10:47.853036 Worker PID: 91057, Assigned GPU ID: 7
2025-07-21 16:10:47.857670 Worker PID: 91058, Assigned GPU ID: 5
2025-07-21 16:10:47.624630 Worker PID: 91059, Assigned GPU ID: 1
2025-07-21 16:10:47.626300 GPU 1 91059 test begin: paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, )
W0721 16:11:21.658066 91059 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8
[Pass] paddle.incubate.nn.functional.fused_bias_dropout_residual_layer_norm(Tensor([8, 128, 1024],"float32"), Tensor([8, 128, 1024],"float32"), None, Tensor([1024],"float32"), Tensor([1024],"float32"), 0.0, 1e-05, )
2025-07-21 16:10:47.755052 Worker PID: 91060, Assigned GPU ID: 6
2025-07-21 16:10:47.723834 Worker PID: 91061, Assigned GPU ID: 4
W0721 16:11:23.500622 91054 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8

Pcard-67164

@paddle-bot
Copy link

paddle-bot bot commented Jul 21, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@CLAassistant
Copy link

CLAassistant commented Jul 21, 2025

CLA assistant check
All committers have signed the CLA.

@hushenwei2000
Copy link
Contributor Author

/re-run all-failed

1 similar comment
@hushenwei2000
Copy link
Contributor Author

/re-run all-failed

@wanghuancoder wanghuancoder merged commit 28be650 into PaddlePaddle:develop Jul 31, 2025
92 of 94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants