[PHI] Fix `paddle.cumsum` calculation speed by cangtianhuang · Pull Request #74442 · PaddlePaddle/Paddle

cangtianhuang · 2025-08-06T08:09:32Z

PR Category

Operator Mechanism

PR Types

Performance

Description

修复 #74081 精度修复时，对部分模型造成的性能下降：https://console.cloud.baidu-int.com/devops/icafe/issue/DLTP-92332/show

修复方法为:

回退 ThrustCumsumKernel 快速路径
为 ThrustCumsumKernel 增加 fp16 与 bf16 类型支持

在之前的测试中，错误地判断了 Thrust 库的计算精度；在新的测试中，对于 1D 超大张量的边缘情况（即单个巨型行）， Thrust 库表现完美，而 BlockScanKernel 由于 grid_size == 1 ，导致其退化为串行执行，计算速度显著下降

以下为 20 万至 20 亿元素个数时， paddle.cumsum API 通过 BlockScanKernel 分支与 ThrustCumsumKernel 分支的计算精度（与 torch 相比）与计算速度对比:

结果说明，在 1D 张量的情况下， Thrust 库的计算精度与计算速度均显著优于当前的 BlockScanKernel 内核实现。当前 BlockScanKernel 内核实现主要为多行数据设计，其每个 Block 都在并行处理不同的数据行。

Pcard-85711

paddle-bot · 2025-08-06T08:09:39Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

* fix ThrustCumsumKernel * refine * refine ThrustCumsumKernel * fix * update ThrustCumsumKernel * fix logcumsumexp in ThrustCumsumKernel

cangtianhuang added 2 commits August 10, 2025 12:38

fix ThrustCumsumKernel

04d985f

Merge remote-tracking branch 'upstream/develop' into fix-cumsum

f164405

cangtianhuang force-pushed the fix-cumsum branch from 05a1ccc to f164405 Compare August 10, 2025 04:44

cangtianhuang changed the title ~~[PHI] Fix BlockPrefixCallbackOp~~ [PHI] Fix paddle.cumsum calculation speed Aug 10, 2025

cangtianhuang added 6 commits August 10, 2025 19:52

refine

3254d78

refine ThrustCumsumKernel

024a070

fix

3408ecc

update ThrustCumsumKernel

f132f3a

fix logcumsumexp in ThrustCumsumKernel

894e11c

Merge remote-tracking branch 'upstream/develop' into fix-cumsum

803ea2b

wanghuancoder approved these changes Aug 12, 2025

View reviewed changes

lshpku approved these changes Aug 12, 2025

View reviewed changes

lshpku merged commit 9db2cad into PaddlePaddle:develop Aug 12, 2025
68 of 69 checks passed

cangtianhuang deleted the fix-cumsum branch September 4, 2025 08:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PHI] Fix `paddle.cumsum` calculation speed #74442

[PHI] Fix `paddle.cumsum` calculation speed #74442
lshpku merged 8 commits intoPaddlePaddle:developfrom
cangtianhuang:fix-cumsum

cangtianhuang commented Aug 6, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cangtianhuang commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

PR Types

Description

Uh oh!

paddle-bot bot commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cangtianhuang commented Aug 6, 2025 •

edited

Loading