Skip to content

Commit a112ce4

Browse files
authored
Lars op optimiztion with cudaLaunchCooperativeKernel method (#35652)
* A leap of try for cudaLaunchCooperativeKernel * fix bugs * Totally replace the lar cuda kernel * Fix bugs * fix code according to comments * fix codes according to review comments * adding some function overload * relocate the power operation.
1 parent e427a0f commit a112ce4

File tree

1 file changed

+314
-77
lines changed

1 file changed

+314
-77
lines changed

0 commit comments

Comments
 (0)