Skip to content

[Codegen] Apply bounds to subgroup_id#23768

Open
krzysz00 wants to merge 1 commit intoiree-org:mainfrom
krzysz00:subgroup-id-bounds
Open

[Codegen] Apply bounds to subgroup_id#23768
krzysz00 wants to merge 1 commit intoiree-org:mainfrom
krzysz00:subgroup-id-bounds

Conversation

@krzysz00
Copy link
Contributor

@krzysz00 krzysz00 commented Mar 12, 2026

Since we're fixing to start using subgroup_id more often with PCF, apply bounds to it based on the subgroup size (or sizes) and the number of threads in the workgroup. Also extend subgroup_size handling to account for known subgroup sizes instead of giving up completely when there isn't a fixed choice made yet.

Also fix up some double-spaces in test attributes.

Since we're fixing to start using subgroup_id more often with PCF,
apply bounds to it based on the subgroup size (or sizes) and the
number of threads in the workgroup. Also extend subgroup_size handling
to account for known subgroup sizes instead of giving up completely
when there isn't a fixed choice made yet.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@krzysz00 krzysz00 force-pushed the subgroup-id-bounds branch from 2a4b225 to ad04bc3 Compare March 12, 2026 22:34
Copy link
Contributor

@amd-eochoalo amd-eochoalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but you may want to wait for other reviewers. Just one comment about the for-loop.

Comment on lines +215 to +229
for (std::optional<int64_t> size : workgroupSizes) {
if (size) {
maxTotalThreads *= *size;
// Cap at the hardware thread-per-workgroup limit inside the loop
// to avoid overflow from multiplying per-dimension maximums.
if (target) {
maxTotalThreads =
std::min(maxTotalThreads,
static_cast<int64_t>(
target.getWgp().getMaxThreadCountPerWorkgroup()));
}
} else {
allSizesKnown = false;
break;
}
Copy link
Contributor

@amd-eochoalo amd-eochoalo Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this is a little bit more difficult to parse than it should be. Could we remove one level of nesting by moving the break earlier and then the else should not be necessary, right? Maybe changing the logic slightly too for calculating the maxTotalThreads when target is known and avoid overflow somehow or adding variables?

// pseudocode

Suggested change
for (std::optional<int64_t> size : workgroupSizes) {
if (size) {
maxTotalThreads *= *size;
// Cap at the hardware thread-per-workgroup limit inside the loop
// to avoid overflow from multiplying per-dimension maximums.
if (target) {
maxTotalThreads =
std::min(maxTotalThreads,
static_cast<int64_t>(
target.getWgp().getMaxThreadCountPerWorkgroup()));
}
} else {
allSizesKnown = false;
break;
}
// if target is available
std::optional<int64_t> maxThreadCountPerWg = static_cast<int64_t> (target.getWgp().getMaxThreadCountPerWorkgroup()));
for (std::optional<int64_t> size : workgroupSizes) {
if (!size) {
allSizesKnown = false;
break;
}
maxTotalThreads *= *size;
// Cap at the hardware thread-per-workgroup limit inside the loop
// to avoid overflow from multiplying per-dimension maximums.
if (maxThreadCountPerWg) {
maxTotalThreads = std::min(maxTotalThreads, maxThreadCountPerWg)
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants