Commit a112ce4
authored
Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652)
* A leap of try for cudaLaunchCooperativeKernel
* fix bugs
* Totally replace the lar cuda kernel
* Fix bugs
* fix code according to comments
* fix codes according to review comments
* adding some function overload
* relocate the power operation.1 parent e427a0f commit a112ce4
1 file changed
+314
-77
lines changed
0 commit comments