[CUB][device] Add a env-based overload of the device segmented reductions primitives#6674
[CUB][device] Add a env-based overload of the device segmented reductions primitives#6674srinivasyadav18 merged 11 commits intoNVIDIA:mainfrom
Conversation
srinivasyadav18
left a comment
There was a problem hiding this comment.
Thank you for trying to contribute to CCCL! Apologies for very early review on the draft. I have some suggestions which might help you to get better context. Feel free to ask any questions you have.
Co-authored-by: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
|
Hello @srinivasyadav18 . Thanks a lot for the feedback :) I added several tests in the new file These include three tests to verify that the new API correctly accepts a stream and the two non- In the comment header for the new function, I linked to the I removed my example code because could not stay here long-term. I also didn’t add a tests for passing an Regarding the complete boiler plate to extract useful info from Cheers ! On a side note, looking at it seems to me that it is an incomplete duplicate of: since it has the same title and does not check correctness of the result. Perhaps it should be removed. |
srinivasyadav18
left a comment
There was a problem hiding this comment.
Thanks a lot for working on the feedback. The example test file looks great now.
I left some minor comments. But rest all Looks Good To Me.
srinivasyadav18
left a comment
There was a problem hiding this comment.
These two comments are totally optional, feel free to look into.
Please let me know if you have questions. Thanks!
|
/ok to test 05194d7 |
This comment has been minimized.
This comment has been minimized.
|
Just added another unit test. Let me know if you think it would be useful or not to extend the work to other segmented reductions. |
srinivasyadav18
left a comment
There was a problem hiding this comment.
Thanks! The test looks good. Require minor changes.
The PR is coming close to merge. Once we have the changes and CI is green we can ship it.
NaderAlAwar
left a comment
There was a problem hiding this comment.
Great work @rbourgeois33! Left a comment regarding the API tests, will approve once that is addressed.
I would prefer to just have |
|
Hi @NaderAlAwar , @srinivasyadav18 ! I think we have resolved all comments/suggestions and that the PR is ready to be merged. Let me know if I should squash / sign my commits and/or update some licence headers (I had to for another contribution to the CUDALibrarySamples repo). Also, thanks for your help ! It's super nice that external contribution to cccl are encouraged like this. I learned a lot ! |
|
/ok to test a20271b |
|
@rbourgeois33 Thank you for working on this.
Signing commits is optional (not required). We squash the commits during merge, so not required to squash in the PR itself. |
NaderAlAwar
left a comment
There was a problem hiding this comment.
Great work @rbourgeois33! We still need to create the env based overloads for the other primitives.
🥳 CI Workflow Results🟩 Finished in 1h 12m: Pass: 100%/93 | Total: 1d 06h | Max: 1h 01m | Hits: 99%/92903See results here. |
Description
closes #6673
This Draft PR implements a env-based overload of the device segmented reductions primitives.
@gevtushenko, following our discussion:
I have checked that the required determinism, if provided, must not be
gpu_to_gpu. As far as I understand, it is equivalent to check that it's run-to-run or not-guaranteed.I made this check list of what this PR should verify before being merged below. Let me know what I should add to the
Sumexample to tick the first box :).Checklist
Sum.