-
Notifications
You must be signed in to change notification settings - Fork 16.2k
Add support for CUMSUM and TRI for CUDA. #17584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
d138a03
Add support for CUMSUM and TRI for CUDA.
pwilkin 67207d2
Minor optimizations.
pwilkin fab0029
Correct warp_prefix_inclusive_sum in float2 variant to return float2
pwilkin 51c40a5
Optimize TRI
pwilkin c30f565
Whitespace
pwilkin 31b55fa
Fix strides.
pwilkin d1ca1c2
Implement double loop
pwilkin 5289b53
Whitespace
pwilkin f422ba8
Fix HIP compilation bugs
pwilkin df917cc
Optimizations + big case performance tests
pwilkin 76382d7
Implement using CUB with fallback to custom kernel
pwilkin 01d4033
Remove error message.
pwilkin 10a2ea9
Fixes from code review
pwilkin 7a83b05
Comment out CPU-unsupported F16/BF16 cases to fix CI
pwilkin bbe3743
Fine, you win :P
pwilkin 069413a
Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS
pwilkin 5aa7438
Vary warp-size based on physical warp size
pwilkin 579eba6
Add GGML_UNUSED_VARS in tri as well
pwilkin 08b3f2d
Use constexpr and call prefix_inclusive with warp_size template param
pwilkin 9cd0eff
Update ggml/src/ggml-cuda/cumsum.cu
pwilkin 9574264
Apply suggestions from code review
pwilkin efd619a
Change to tid % warp_size
pwilkin 86a0853
Fix strides; hardcode mask; add ggml_lane_mask_t
pwilkin de45c63
Missing renames, remove unused get_warp_mask(), explicit calls to ggm…
pwilkin 8a7375c
Too hasty...
pwilkin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.