Skip to content

Conversation

@cszdrg
Copy link
Contributor

@cszdrg cszdrg commented Aug 11, 2025

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

本计划为pairwise_distance编写全新内核实现反向梯度均分与torch进行对齐
在与torch进行比对后发现 torch也是直接调用p_norm进行计算,差异仅仅是paddle的p_norm的反向梯度没有进行均分
而之前的paddle.nn.functional.normalize测试样例中没有对无穷范数的测试,所以该问题没有暴露出来
所以直接对p_norm在p为无穷的情况下进行反向求导修改,增加均分

运行测试后均能通过

此pr和 #74197 修改相同放出性能测试
此pr:
截屏2025-08-11 18 24 39
pr74197:
截屏2025-08-11 18 25 01
由于eign库使用的是先生成表达式在进行执行 所以没啥malloc的开销


eigen库存在bug 此pr废弃

  1. 在float16进行计算时 gpu不支持 只能使用fp32进行计算才能达到正常速度
  2. 对大tensor的sum reshape broatdcast操作中存在问题 会出现cuda700

@paddle-bot
Copy link

paddle-bot bot commented Aug 11, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@cszdrg
Copy link
Contributor Author

cszdrg commented Aug 12, 2025

/re-run all-failed

2 similar comments
@cszdrg
Copy link
Contributor Author

cszdrg commented Aug 12, 2025

/re-run all-failed

@cszdrg
Copy link
Contributor Author

cszdrg commented Aug 12, 2025

/re-run all-failed

@cszdrg cszdrg closed this Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant