Skip to content

[BUG]: ptxas error: Entry function with max regcount of 168 calls function with regcount of 254 on sm_120 (Blackwell) #7700

@zkhatami

Description

@zkhatami

Is this a duplicate?

Type of Bug

Compile-time Error

Component

CUB

Describe the bug

For this test:

#include <execution>
#include <numeric>
#include <vector>

int32_t
main()
{
  std::vector<double> in(1000);
  std::vector<double> out(1000);
  auto orr = std::inclusive_scan(std::execution::par, in.begin(), in.end(), out.begin());
  return 0;
}

We’re seeing a ptxas register mismatch error when compiling for sm_120 (Blackwell) that didn’t occur on earlier architectures (e.g. sm_80).
This appears to be related to changes introduced in commit 94bd6e4
specifically the new get_device_scan_launch_bounds() function and its warpspeed path.

ptxas error   : Entry function '_ZN3cub22_V_300400_SM_120_NVHPC6detail4scan16DeviceScanKernelINS2_10policy_hubIdddmN4cuda3std3__44plusIvEEE10Policy1000EPdSC_NS0_13ScanTileStateIdLb1EEES9_NS0_8NullTypeEmdLb0ESF_EEvT0_T1_NS2_23tile_state_kernel_arg_tIT2_T6_EEiT3_T4_T5_i' with max regcount of 168 calls function '_ZN3cub22_V_300400_SM_120_NVHPC6detail4scan10kernelBodyINS2_10policy_hubIdddmN4cuda3std3__44plusIvEEE10Policy100015WarpspeedPolicyEdddS9_NS0_8NullTypeELb0EEEvNS1_9warpspeed5SquadENSE_16SpecialRegistersERKNS2_16scanKernelParamsIT0_T1_T2_EET3_T4_' with regcount of 254

A new call get_device_scan_launch_bounds makes 352 number of threads for sm_120 and thus 168 max regcount per thread. The warpspeed path leads to a different policy selection and higher thread count on sm_120, lowering the available registers per thread and triggering the error.
I found that a similar issue was previously discussed and addressed here: #902.

How to Reproduce

nvc++ -stdpar -fast --c++17  scan.cpp

Expected behavior

Test should compile fine after the fix.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

Status

In Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions