Skip to content

Conversation

@GiggleLiu
Copy link
Contributor

Fix the following issue:
ArrogantGao/benchmark_tropical_tensornetwork#1

After fix, it has 20x speed up on high dimensional arrays:

julia> @btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32));
  15.316 μs (57 allocations: 1.52 KiB)

@maleadt
Copy link
Member

maleadt commented Apr 25, 2025

@vchuravy Any idea where this overhead comes from? Suboptimal launch configuration, or the kernel-side div?

Fix the following issue:
ArrogantGao/benchmark_tropical_tensornetwork#1

After fix, it has 20x speed up on high dimensional arrays:
```julia
julia> @Btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32));
  15.316 μs (57 allocations: 1.52 KiB)
```
@maleadt maleadt merged commit 55a943e into JuliaGPU:master Apr 25, 2025
13 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants