cuda.compute: Allow multiple uses of the same function in single compilation#7072
Merged
shwina merged 2 commits intoNVIDIA:mainfrom Jan 5, 2026
Merged
Conversation
This comment has been minimized.
This comment has been minimized.
|
|
||
| # Global counter to generate unique symbol names even when the same function | ||
| # is used multiple times (e.g., as both selectors in `three_way_partition`). | ||
| _wrapper_name_counter = itertools.count() |
Contributor
There was a problem hiding this comment.
Perhaps this global counter needs a lock to avoid creating race condition in free-threaded interpreter
Contributor
Author
There was a problem hiding this comment.
Fixed, although I will say that cuda.compute as a whole is probably not thread-safe today. All the caching mechanisms etc., haven't taken thread safety into account thus far (#6422).
Contributor
🥳 CI Workflow Results🟩 Finished in 1h 07m: Pass: 100%/48 | Total: 12h 27m | Max: 39m 44sSee results here. |
NaderAlAwar
approved these changes
Jan 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Closes #6768
When linking together LTOIRs, care must be taken to ensure that we don't have multiple symbols with the same name. If we just used the name of a function as its corresponding symbol name, this is very easy to run into. For example, we could have an algorithm with two input arguments - both iterators - and the corresponding iterator advance methods would both be named
advance.To avoid this, we were previously using the function
idto compute a unique suffix for the corresponding symbol name. As seen in #6768, this is still problematic when the exact same function is used more than once in the same compilation (for example, as both selectors of athree_way_partition).This PR makes it so that we use a simple counter for computing the suffix instead, guaranteeing a unique suffix for every symbol.
To ensure this doesn't inadvertently cause any performance regressions, I ran our existing Python benchmarks both before (
be) and after (af) this PR. The results show no difference.Comparison results for posterity:
results.txt
Checklist