Speed up fill for high dimensional arrays #591

GiggleLiu · 2025-04-25T06:58:49Z

Fix the following issue:
ArrogantGao/benchmark_tropical_tensornetwork#1

After fix, it has 20x speed up on high dimensional arrays:

julia> @btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32));
  15.316 μs (57 allocations: 1.52 KiB)

maleadt · 2025-04-25T09:22:47Z

@vchuravy Any idea where this overhead comes from? Suboptimal launch configuration, or the kernel-side div?

@sync

Fix the following issue: ArrogantGao/benchmark_tropical_tensornetwork#1 After fix, it has 20x speed up on high dimensional arrays: ```julia julia> @Btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32)); 15.316 μs (57 allocations: 1.52 KiB) ```

GiggleLiu mentioned this pull request Apr 25, 2025

Slow fill kernel for generic element type ArrogantGao/benchmark_tropical_tensornetwork#1

Open

maleadt approved these changes Apr 25, 2025

View reviewed changes

GiggleLiu added 2 commits April 25, 2025 11:24

Update construction.jl

a7deabc

maleadt force-pushed the patch-1 branch from a1ea59d to a7deabc Compare April 25, 2025 09:24

maleadt merged commit 55a943e into JuliaGPU:master Apr 25, 2025
13 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up fill for high dimensional arrays #591

Speed up fill for high dimensional arrays #591

GiggleLiu commented Apr 25, 2025

Uh oh!

maleadt commented Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Speed up fill for high dimensional arrays #591

Speed up fill for high dimensional arrays #591

Conversation

GiggleLiu commented Apr 25, 2025

Uh oh!

maleadt commented Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants