-
Notifications
You must be signed in to change notification settings - Fork 5.9k
copy found_inf to cpu in advance to improve performance #34274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
copy found_inf to cpu in advance to improve performance #34274
Conversation
|
Thanks for your contribution! |
a7c2068 to
dd7bbc3
Compare
pangyoki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
chenwhql
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for PADDLE_ENFORCE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for ShareDataWith, but discussed that name like "x_" can be changed because it usually looks like class private data member name. We will change in the future PR.
Refined in #34330, thanks. |
…#34274) * copy found_inf to cpu in advance to improve performance * add npu test * add npu test * refine code * refine memcpy op * fix adam
PR types
Performance optimization
PR changes
Others
Describe
Since found_inf needs to be on cpu in adam op, we copy it in advance to avoid multiple time copies.