Is this a duplicate?
Type of Bug
Silent Failure
Component
libcu++
Describe the bug
When using cuda::launch with dynamic shared memory, we allocate the proper size, but we don't take into account the alignment.
In cuda::dynamic_shared_memory(config), we do:
{
extern __shared__ unsigned char dyn_smem[];
return *reinterpret_cast<T*>(dyn_smem);
}
but there is no guarantee that dyn_smem will have the right alignment.
How to Reproduce
no-repro
Expected behavior
cuda::dynamic_shared_memory(config) should return a view to properly aligned memory address.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response