Skip to content

[CUB][device] Add a env-based overload of the device segmented reductions primitives#6674

Merged
srinivasyadav18 merged 11 commits intoNVIDIA:mainfrom
rbourgeois33:main
Dec 2, 2025
Merged

[CUB][device] Add a env-based overload of the device segmented reductions primitives#6674
srinivasyadav18 merged 11 commits intoNVIDIA:mainfrom
rbourgeois33:main

Conversation

@rbourgeois33
Copy link
Contributor

@rbourgeois33 rbourgeois33 commented Nov 18, 2025

Description

closes #6673

This Draft PR implements a env-based overload of the device segmented reductions primitives.

@gevtushenko, following our discussion:

Consider adding a static assert that if determinism is specified - it's run-to-run or not-guaranteed.

I have checked that the required determinism, if provided, must not be gpu_to_gpu. As far as I understand, it is equivalent to check that it's run-to-run or not-guaranteed.

I made this check list of what this PR should verify before being merged below. Let me know what I should add to the Sum example to tick the first box :).

Checklist

  • We have agreed on the way to implement the env-based device segmented reductions starting with the first one:Sum.
  • I have extended the overload to other relevant primitives.
  • The documentation is up to date with these changes.
  • If requirements are the same as the CUDALibrarySamples repo, I have updated the licence headers and signed my commits.
  • I have squashed commits

@rbourgeois33 rbourgeois33 requested a review from a team as a code owner November 18, 2025 15:15
@github-project-automation github-project-automation bot moved this to Todo in CCCL Nov 18, 2025
@copy-pr-bot
Copy link
Contributor

copy-pr-bot bot commented Nov 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Nov 18, 2025
@rbourgeois33 rbourgeois33 marked this pull request as draft November 18, 2025 15:18
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Nov 18, 2025
@rbourgeois33 rbourgeois33 changed the title [CUB][device][WIP] Add a env-based overload of the device segmented reductions primitives Draft: [CUB][device] Add a env-based overload of the device segmented reductions primitives Nov 18, 2025
Copy link
Contributor

@srinivasyadav18 srinivasyadav18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for trying to contribute to CCCL! Apologies for very early review on the draft. I have some suggestions which might help you to get better context. Feel free to ask any questions you have.

rbourgeois33 and others added 2 commits November 21, 2025 20:10
Co-authored-by: Srinivas Yadav <43375352+srinivasyadav18@users.noreply.github.com>
@rbourgeois33
Copy link
Contributor Author

rbourgeois33 commented Nov 21, 2025

Hello @srinivasyadav18 . Thanks a lot for the feedback :)

I added several tests in the new file cub/test/catch2_test_device_segmented_reduce_env_api.cu as you suggested.

These include three tests to verify that the new API correctly accepts a stream and the two non-gpu_to_gpu determinism arguments from the env.

In the comment header for the new function, I linked to the segmented-reduce-sum-env-stream example, since the determinism arguments are currently unused. I believe this is more useful for the documentation. Perhaps the fact that the determinism arguments are unused should be clearly stated.

I removed my example code because could not stay here long-term. I also didn’t add a tests for passing an mr argument since memory resources are still experimental.

Regarding the complete boiler plate to extract useful info from env, I think it's a great idea ! Less code duplicate is always better, in particular since the env extraction is quite verbose. I subscribed the PR and make changes accordingly in time.

Cheers !

On a side note, looking at

C2H_TEST("cub::DeviceReduce::Sum accepts stream", "[reduce][env]")

it seems to me that it is an incomplete duplicate of:

C2H_TEST("cub::DeviceReduce::Sum accepts stream", "[reduce][env]")
since it has the same title and does not check correctness of the result. Perhaps it should be removed.

Copy link
Contributor

@srinivasyadav18 srinivasyadav18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on the feedback. The example test file looks great now.
I left some minor comments. But rest all Looks Good To Me.

Copy link
Contributor

@srinivasyadav18 srinivasyadav18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two comments are totally optional, feel free to look into.
Please let me know if you have questions. Thanks!

@srinivasyadav18
Copy link
Contributor

/ok to test 05194d7

@github-actions

This comment has been minimized.

@rbourgeois33
Copy link
Contributor Author

Just added another unit test. Let me know if you think it would be useful or not to extend the work to other segmented reductions.

Copy link
Contributor

@srinivasyadav18 srinivasyadav18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The test looks good. Require minor changes.
The PR is coming close to merge. Once we have the changes and CI is green we can ship it.

Copy link
Contributor

@NaderAlAwar NaderAlAwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @rbourgeois33! Left a comment regarding the API tests, will approve once that is addressed.

@srinivasyadav18
Copy link
Contributor

Just added another unit test. Let me know if you think it would be useful or not to extend the work to other segmented reductions.

I would prefer to just have Sum() in the current PR and other overloads like Reduce() Min() Max() etc.. in the next one, as this would simplify the review process and also helps to reduce boilerplate code as this PR #6622 will be merged soon.

@rbourgeois33
Copy link
Contributor Author

Hi @NaderAlAwar , @srinivasyadav18 !

I think we have resolved all comments/suggestions and that the PR is ready to be merged.

Let me know if I should squash / sign my commits and/or update some licence headers (I had to for another contribution to the CUDALibrarySamples repo).

Also, thanks for your help ! It's super nice that external contribution to cccl are encouraged like this. I learned a lot !

@NaderAlAwar NaderAlAwar marked this pull request as ready for review December 2, 2025 19:03
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Dec 2, 2025
@NaderAlAwar
Copy link
Contributor

/ok to test a20271b

@srinivasyadav18
Copy link
Contributor

@rbourgeois33 Thank you for working on this.

Let me know if I should squash / sign my commits and/or update some licence headers

Signing commits is optional (not required). We squash the commits during merge, so not required to squash in the PR itself.

Copy link
Contributor

@NaderAlAwar NaderAlAwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @rbourgeois33! We still need to create the env based overloads for the other primitives.

@srinivasyadav18 srinivasyadav18 changed the title Draft: [CUB][device] Add a env-based overload of the device segmented reductions primitives [CUB][device] Add a env-based overload of the device segmented reductions primitives Dec 2, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2025

🥳 CI Workflow Results

🟩 Finished in 1h 12m: Pass: 100%/93 | Total: 1d 06h | Max: 1h 01m | Hits: 99%/92903

See results here.

@srinivasyadav18 srinivasyadav18 merged commit e2737a8 into NVIDIA:main Dec 2, 2025
106 of 107 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[FEA]: [CUB] Add a env-based overload of the device segmented reductions primitives

3 participants