Support Div and FloorDiv functor in elementwise system#33053
Support Div and FloorDiv functor in elementwise system#33053Xreki merged 29 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
…Paddle into Adding_div_functors
|
Sorry to inform you that bc2b805's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
5e62170 to
e2692ee
Compare
d610983 to
3b5c6b2
Compare
3b5c6b2 to
2f72406
Compare
There was a problem hiding this comment.
Floor_div 处理的数据类型仅有 int32和 int64,返回值如果也是 int32 或 int64的话,会有自动截断操作。本地也搭建了测试脚本发现两者的计算结果相同。
import numpy as np
import paddle as pd
npx = np.array([9, -11, -8, 7])
npy = np.array([2, 3, 3, 2])
x = pd.to_tensor(npx, dtype='int32')
y = pd.to_tensor(npy, dtype='int32')
z2 = pd.divide(x, y)
z1 = pd.floor_divide(x, y)
result = pd.subtract(z1, z2) # result = [0, 0, 0, 0]
目前对比的结果显示,针对 floor_div 的计算,Paddle和Pytorch的计算结果是一样的,具体表现是在针对 floor_div(-8, 3) 这种计算时,
paddle和 pytorch计算的规则生成的结果是 -2,向中心点floor;
Numpy 和 tensorflow的结果是 -3,向x负半轴方向floor。
- 感觉可以再讨论一下是否需要修改Paddle中floor_div的计算规则。
There was a problem hiding this comment.
L43出现笔误,下个commit修改掉。
There was a problem hiding this comment.
这里用的是偷懒的做法,增加模板或者函数头增加static 关键字,防止了编译过程出现函数重定义的问题。实际修改方法应该是将新开一个.cu文件,然后把函数放进.cu里面完成实例化。
There was a problem hiding this comment.
查了一下stackoverflow 这个问题不能通过 #pragma once 解决。总结出来的问题有以下两种:
(1)仍旧另起一个.cu文件实例化函数体;
(2)加inline 或者 static 关键字;
偷了个懒,选择加了个inline 关键字解决。
There was a problem hiding this comment.
这里,float16的向量化长度可能返回8,但是你没有检查地址是否满足vec8对齐的要求。
There was a problem hiding this comment.
这点确实考虑漏掉了,按照建议修改,下个Commit提交。
There was a problem hiding this comment.
我们的AlignedVector或许可以改成继承Array类,定义AlignedArray。
Paddle/paddle/fluid/framework/array.h
Lines 24 to 35 in 42c1297
3a107ae to
85d954c
Compare
chenwhql
left a comment
There was a problem hiding this comment.
lgtm for paddle enforce
PR types
Performance optimization
PR changes
OPs
Describe
Basing on new elementwise + broadcast system support binary functors below :
Div
Floor_div
The performance variation is below:

The explicit comparison of floor_div is below: