-
Notifications
You must be signed in to change notification settings - Fork 5.9k
add clip_by_norm fp16 kernel #35446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add clip_by_norm fp16 kernel #35446
Conversation
|
Thanks for your contribution! |
|
LGTM for TensorReduceFunctorImpl |
Aurelius84
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for DataType Registarar.
chenwhql
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
| // merge ids in selected rows first | ||
| math::scatter::MergeAdd<platform::CUDADeviceContext, platform::float16> | ||
| merge_func; | ||
| SelectedRows* merged_input = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不太理解为什么要将临时的merged_input放到scope中,看起来只是在kernel中使用的临时变量
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯嗯,后面再优化下这部分代码
* add clip_by_norm fp16 kernel * add ut
* add clip_by_norm fp16 kernel * add ut
PR types
New features
PR changes
OPs
Describe
add clip_by_norm fp16 kernel