Support Ternary ops in elmentwise and broadcast by JamesLim-sy · Pull Request #33976 · PaddlePaddle/Paddle

JamesLim-sy · 2021-07-05T18:22:22Z

PR types

Function optimization

PR changes

OPs

Describe

Change elementwise branch for support ternary ops .
Moving GetVectorizedSize function into fast_divmod.h to make it common.
Change sub-namespace of fast_divmod.h from "operator" to "platform", and change the subsequent codes that calls the function inside fast_divmod.h .

paddle-bot-old · 2021-07-05T18:22:25Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Xreki · 2021-07-06T01:13:46Z

+  using InVecType = platform::CudaAlignedVector<InT, VecSize>;
+  using OutVecType = platform::CudaAlignedVector<OutT, VecSize>;
+
+  const InT *__restrict__ in_data[ET];


为啥不直接叫ins呢？

因为 ins 变量名已经在之前被分配给了 const std::vector<const framework::Tensor *> &ins ，用来代指输入tensor向量

Xreki · 2021-07-06T01:17:18Z

-    // store
-    data.store_scalar(out, idx);
-  }
+  // load


这里的循环还是需要的吧？

这里加入循环操作，在我支持三元计算的时候看着挺奇怪的，因为原始的判断条件是：

int remain = size - VecSize * tid; remain = remain > 0 ? remain : 0; ... ... if (remain >= VecSize) { VectorizedKernelImpl(data, func, tid); } else { ScalarKernelImpl(data, func, tid * VecSize, remain); }

这样的话就已经筛选出了仅执行尾段不可向量化数据的线程tid了，以及不可向量化的起始位置 tid * VecSize，但是由于前面计算remain部分的存在，以及tid * VecSize 这步计算的存在，导致真正进入ScalarKernelImpl并实施计算的只有一个线程。这个线程进入之后再执行一个for 循环计算：

for (int i = 0; i < remain; ++i) { int idx = start + i; data.load_scalar(ins, idx); out = func(ins); data.store_scalar(out, idx); }

但是，这里也可以采用多线程完成标量化计算，尽管收益会很小。目前的改法则是尽量利用了多线程：

if (tid < tail_tid) { // 直接筛选出 tid号为 0, 1, 2... 的线程 ScalarKernelImpl<ET, DataWarpper, InT, OutT, Functor>(data, func, tid); } if (tid < numel) { // 直接筛选出 tid号为 0, 1, 2... 的线程 ScalarKernelImpl<ET, DataWarpper, InT, OutT, Functor>(data, func, tid); } // 同时DataWarpper内辅以预先记录好的标量化计算起始点 scalar_cal_offset，实现下面的计算代替 start + i; args[i] = in_data[i][tid + scalar_cal_offset];

刚刚做了一个x = [31, 129], y = [31, 129]的OP Benchmark 本地case，也通过了精度测试

ZzSean · 2021-07-08T12:47:16Z

 template <typename InT, typename OutT>
-int GetVectorizedSize(const std::vector<const framework::Tensor *> &ins,
-                      const std::vector<framework::Tensor *> &outs) {
+int GetVectorizedSizeImpl(const std::vector<const framework::Tensor *> &ins,


xxxImpl一般是xxx的具体实现，逻辑上应该是被调用比较合理

可以改成GetVectorizedSizeForTensors这样的？

感觉可以改成 GetVectorizedSizeForIO

ZzSean · 2021-07-08T12:50:02Z

-  using OutVecType = CudaAlignedVector<OutT, VecSize>;
+template <ElementwiseType ET, int VecSize, typename DataWarpper, typename InT,
+          typename OutT, typename Functor>
+__device__ inline void VectorizedKernelImpl(DataWarpper data, Functor func,


加DataWarpper这个模板参数是为了后续还会有别的DataWarpper来使用这个函数吗

咱们这种类名统一一下吧，都叫ElementwiseArgsWrapper、BroadcastArgsWrapper。

加入DataWarpper 这个模型参数的目的只是单纯的传递一个数据类型，供给形参使用：

template <typename DataWarpper, ...> _device__ inline void VectorizedKernelImpl(DataWarpper data,....)

下个Commit 会修改成 ElementwiseArgsWrapper.

ZzSean · 2021-07-28T08:02:01Z

-    int tid) {
-  using InVecType = CudaAlignedVector<InT, VecSize>;
-  using OutVecType = CudaAlignedVector<OutT, VecSize>;
+template <ElementwiseType ET, int VecSize, typename ElementwiseWarpper,


warpper->wrapper

收到，下个commit 修改过来

ZzSean · 2021-07-28T08:03:38Z

-__device__ inline void ScalarKernelImpl(
-    ElementwiseDataWrapper<ET, VecSize, InT, OutT> data, Functor func,
-    int start, int remain) {
+template <ElementwiseType ET, typename ElementwiseWarpper, typename InT,


收到，下个commit 修改过来

ZzSean · 2021-07-28T08:04:29Z

LGTM

Xreki · 2021-08-05T06:29:50Z

-  using OutVecType = CudaAlignedVector<OutT, VecSize>;
+template <ElementwiseType ET, int VecSize, typename ElementwiseWrapper,
+          typename InT, typename OutT, typename Functor>
+__device__ inline void VectorizedKernelImpl(ElementwiseWrapper data,


ElementwiseWrapper没有必要放模板里面？

可以删除掉，下个commit 删除

Fisrt commit

09d70e0

Xreki reviewed Jul 6, 2021

View reviewed changes

Change some varible names.

a73c141

ZzSean reviewed Jul 8, 2021

View reviewed changes

JamesLim-sy added 2 commits July 14, 2021 04:57

Fix varible and function names

2046e9d

Merge branch 'develop' into Supporting_Ternary_elementwise_ops

1d035ff

JamesLim-sy requested a review from Xreki July 14, 2021 11:00

ZzSean reviewed Jul 28, 2021

View reviewed changes

Fix name spelling error.

c5e8e2b

Xreki approved these changes Aug 5, 2021

View reviewed changes

Xreki merged commit 1d7b75d into PaddlePaddle:develop Aug 5, 2021

Conversation

JamesLim-sy commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR types

PR changes

Describe

Uh oh!

paddle-bot-old Bot commented Jul 5, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JamesLim-sy Jul 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZzSean Jul 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZzSean commented Jul 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JamesLim-sy commented Jul 5, 2021 •

edited

Loading

JamesLim-sy Jul 14, 2021 •

edited

Loading

ZzSean Jul 8, 2021 •

edited

Loading