[Profiling] Update elementwise op#6229
Conversation
2a5fb74 to
ff4c825
Compare
18a50a2 to
cc210de
Compare
cc210de to
54f0962
Compare
reyoung
left a comment
There was a problem hiding this comment.
Excellent! There are some tiny enhancements and one typo.
| : ptr_(ptr), i_(0), j_(0), n_(n), post_(post) {} | ||
|
|
||
| MidWiseTransformIterator<T, platform::CPUPlace>& operator++() { | ||
| i_ = ++j_ / post_ % n_; |
There was a problem hiding this comment.
Please add parentheses here.
|
|
||
| bool operator!=( | ||
| const MidWiseTransformIterator<T, platform::CPUPlace>& rhs) const { | ||
| return (ptr_ + i_) &= &(*rhs); |
| typedef thrust::iterator_adaptor< | ||
| RowwiseTransformIterator<T, platform::GPUPlace>, const T*> | ||
| super_t; | ||
| __host__ __device__ RowwiseTransformIterator(const T* x, int n) |
There was a problem hiding this comment.
Please use HOSTDEVICE macro
| private: | ||
| unsigned int n_; | ||
| const T* begin_; | ||
| __host__ __device__ typename super_t::reference dereference() const { |
There was a problem hiding this comment.
Please use HOSTDEVICE macro
| public: | ||
| void Compute(const framework::ExecutionContext& ctx) const override { | ||
| ElementwiseCompute<EigenAddFunctor, Place, T>(ctx); | ||
| using Tensor = framework::Tensor; |
There was a problem hiding this comment.
Maybe we should implement all elemwise operators based on this method.
Not in hurry, but need an issue to record the following works.
There was a problem hiding this comment.
We can do these in next PR.
| template <typename Functor, typename T, typename Place> | ||
| struct TransformFunctor { | ||
| TransformFunctor(const framework::Tensor* x, const framework::Tensor* y, | ||
| framework::Tensor* z, const framework::ExecutionContext& ctx, |
There was a problem hiding this comment.
Better to use DeviceContext other than ExecutionContext
830dbaa to
1517889
Compare
1517889 to
9e244a8
Compare
| T* z_; | ||
| int64_t nx_; | ||
| const platform::DeviceContext& ctx_; | ||
| Functor func_; |
There was a problem hiding this comment.
The naming rules of data members for struct is different from class.
https://google.github.io/styleguide/cppguide.html#Variable_Names
There was a problem hiding this comment.
Done, I have replaced struct with class.
Fix #6166
"We found the performance of
Eigen::Tensor::broadcastis not good if theTensoris not just a scalar, because theEigen::Tensor::broadcasttry to implement a very generalbroadcastmethod. However, the most ofbroadcastoperators in the neural network isrowwiseorcolwise. The implementation ofrowwiseorcolwiseis much simpler and faster than a general implementation.The
Eigen::Tensor::broadcastis mainly implmeneted by this function." —— @reyoung