With the Hopper architecture, NVIDIA has introduced "clusters" of blocks which can use each other's shared memory. The clustering can be set either using a __cluster_dims__(1,2,3) qualifier in the kernel's signature, or at run-time. We need to support the run-time setting within our launch_configuration_t class and in the launch config builder mechanism.