-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Refine cos-sim-op #6601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine cos-sim-op #6601
Conversation
784740d to
14d3271
Compare
|
This way of writing maybe not good, but in terms of speed, this way is 10 times faster than before using eigen. Config and Env
Config and EnvThe experimental environment is the same as that described above, only the code is different.
|
a45bc33 to
784740d
Compare
63e3ff5 to
116bde6
Compare
116bde6 to
49df2a7
Compare
… profiling/cosine_op_debug
paddle/operators/cos_sim_op.h
Outdated
| z_(z), | ||
| cols_(static_cast<size_t>(cols)) {} | ||
|
|
||
| inline HOSTDEVICE void operator()(size_t offset) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change offset => i and below for to j may be more clear? Or row_id and col_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I have replace offset with row_id.
72ce007 to
c2577f4
Compare
c2577f4 to
812c5f6
Compare
typhoonzero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM++
kernel block_size can use some global configuration, since it's rarely chaged.
|
Yes, I will change this in the next PR. |
Fix #6486
Experiments Env:
Code:
Total time of 1 Pass:
I found GPU running a little slower than CPU. The result of @typhoonzero's statistics in this issue is also a little slower in GPU.