[PHI] Add uint8/int16 CUDA atomic mul/min/max and upgraded take/put_along_axis (input types) #74693
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Improvements
Description
在 gpu_primitives 中增加了 uint8/int16 两种类型的 CUDA atomic functions,针对如下三种操作:
对应的单测通过
put_along_axis相关的单测实现并进行。同时兼容升级了put_along_axis以及take_along_axis,增加了 int16/uint8 的支持(除了上述三种op,add也增加了支持,只不过add的atomic uint8 int16操作原本就存在),删除了某些文件中 SFINAE 实现的 uint8/int16 绕过。除此之外,这些 atomic primitives 本身由于无法导出单测,本人在 Enigmatisms/atomic_playground 中导出了 Python 接口,与 host 端std::accumulation结果进行了大量对比测试。Pcard-89620