Skip to content

Conversation

@forrestjgq
Copy link

In turing and ampere device, invalid device symbol will be reported in:

123 static __constant__ GpuData cData;
124 static GpuData cpuData;
125 
126 void SetKernelsGpuData(GpuData* pData)
127 {
128     cudaError_t status;
129     status = cudaMemcpyToSymbol(cData, pData, sizeof(GpuData));                                                         
130    >> RTERROR(status, "SetKernelsGpuData copy to cData failed");
131     memcpy(&cpuData, pData, sizeof(GpuData));
132 }

@syntesys87
Copy link

Tried in a system with a 4070ti and cuda 11.8. Working!

@TitouanCh
Copy link

works for rtx a4500

@atillack
Copy link
Member

atillack commented Jul 23, 2024

@forrestjgq Thank you for submitting your PR. While we won't go ahead and merge it - as adding to the static TARGETS list (and it existing in the first place) is not the best choice - it did inspire me to solve this particular pain point more generally.

To that end, I added PR #270 which by default will compile Cuda for every supported target of the installed Cuda version (larger than compute capability 50 as lower compute capabilities have a deprecation warning). In other words, Cuda 11 will compile up to compute capability 86, while Cuda 12 will go to compute capability 90 (and beyond if/when Nvidia adds it).

As with the current code, TARGETS can be used to override this during compilation, i.e. make DEVICE=GPU TARGETS=86 will only produce code optimized for compute architecture 86.

@atillack
Copy link
Member

The newer version of AD-GPU >= v1.6 automatically detect and compile for the existing Cuda framework's compute capabilities so this PR is now outdated.

@atillack atillack closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants