-
Notifications
You must be signed in to change notification settings - Fork 5.9k
implementation of broadcast add backward by reduce #34143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
75c6a5f to
ceed224
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LastDIm 和Any都走同一套流程是不是可以直接删除LastDim选项,这样能减少一个判断
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个判断是必要的,因为对应的index计算不同,对性能倒也没太大影响
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉和下面那个函数功能重复了,是否可以复用GetReduceDim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这两个函数功能不同,这个是根据broadcast 的输入输出dim去计算reduce dim的,GetReduceDim这个函数是已知reduce dim,计算真正reduce的dim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数其他reduce功能的op用不到吧?先放elementwise_op_function.h里面?将reduce_op.h都include进来,引入了很多用不到的代码,会影响编译速度吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好的
2c253af to
3e5561a
Compare
3e5561a to
505bf46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对应的CUDA实现也可以删掉,L136-L295
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已删除
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉这些判断、TensorCopy逻辑,与ElementwiseAddGradKernel、elementwise_add_grad中的判断有很多重复之处,能不能为CUDADeviceContext特化实现ElementwiseAddGradKernel,简化下代码逻辑?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉这是我们inplace逻辑的bug。。。先这样处理吧。。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数其他reduce功能的op用不到吧?先放elementwise_op_function.h里面?将reduce_op.h都include进来,引入了很多用不到的代码,会影响编译速度吗?
cb9999c to
a8c58e0
Compare
Xreki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Performance optimization
PR changes
OPs
Describe
使用reduce实现broadcast add 反向,并且优化reduce最低维很小的时候的性能。