Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

N5N3 · 2022-05-17T13:37:30Z

Local benchmark shows that this makes non-@simd 1d/2d loop faster.

function sumcart_iter(A)
    s = zero(eltype(A))
    for I in CartesianIndices(A)
        @inbounds @fastmath s += A[I] # use @fastmath to enable simd
    end
    s
end

on master:

julia> A = view(rand(256*20),1:256*20);
julia> @btime sumcart_iter($A)
  4.657 μs (0 allocations: 0 bytes)
2539.6790676561955

julia> B = view(rand(256,20),1:256,1:20);
julia> @btime sumcart_iter($B)
  4.657 μs (0 allocations: 0 bytes)
2557.7341880811764

This PR

julia> @btime sumcart_iter($A)
  305.200 ns (0 allocations: 0 bytes)
2539.6790676562046

julia> @btime sumcart_iter($B)
  406.500 ns (0 allocations: 0 bytes)
2557.734188081178

johnnychen94 · 2022-05-17T14:00:45Z

base/multidimensional.jl

        rng = indices[1]
        I = state[1] + step(rng)
-        valid = __is_valid_range(I, rng) && state[1] != last(rng)
+        if N == 1


I'm not sure if I get the idea -- when will N != 1 here?

N is the dimension of the CartesianIndices:

If N > 1, the outermost dimension uses __is_valid_range to preserve the performance improvement introduced in add StepRange support for CartesianIndices #37829.

If N == 1, just use state[1] != last(rng), as __is_valid_range pervents vectorization.

But by calling __inc(state.I, iter.indices, Val(ndims(iter))) as in you did in R404, because of the type annotion state::Tuple{Int}, indices::Tuple{OrdinalRangeInt}, this method R420 would only be called when length(iter.indices) == ndims(iter) == 1, right?

It's also called at R435 as the last input ndim is passed deeper without any change.

BTW, all the test failures should be invalid state. Since we have

julia> iterate(1:2:typemax(Int), typemax(Int)-1) (-9223372036854775808, -9223372036854775808)

I think it's OK to replace them.

base/multidimensional.jl

johnnychen94 · 2022-05-17T15:53:33Z

This is good for me as long as the test passes. Also ping @vchuravy and @timholy as this was originally written in #31011

N5N3 · 2022-05-18T03:48:49Z

Some local BaseBenchmark:

          "index" => 7-element BenchmarkTools.BenchmarkGroup:
                  tags: ["sum", "simd"]
                  ("sumeach", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-85.57% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-87.51% => improvement)
                  ("sumcartesian", "1:100000") => TrialJudgement(-100.00% => improvement)
                  ("sumeach", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.62% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.48% => improvement)
                  ("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}") => TrialJudgement(-84.04% => improvement)
                  ("sumcartesian", "100000:-1:1") => TrialJudgement(-100.00% => improvement)

base/multidimensional.jl

Co-Authored-By: Johnny Chen <[email protected]>

(These were introduced for performance.)

N5N3 · 2022-06-18T07:16:08Z

With today's master (LLVM14), this PR also helps vectorizing 3d CartesianIndices (Very limited though)

N5N3 · 2023-10-08T15:38:10Z

No need after #51606.

jonas-schulze · 2023-10-13T06:49:48Z

It's a bit sad that (subjectively) so many PRs get forgotten about for so long.

N5N3 added the performance Must go faster label May 17, 2022

N5N3 requested a review from johnnychen94 May 17, 2022 13:38

johnnychen94 reviewed May 17, 2022

View reviewed changes

base/multidimensional.jl Show resolved Hide resolved

johnnychen94 approved these changes May 18, 2022

View reviewed changes

johnnychen94 reviewed May 18, 2022

View reviewed changes

base/multidimensional.jl Outdated Show resolved Hide resolved

N5N3 force-pushed the cart-auto-simd branch from d105525 to db66c7a Compare May 18, 2022 10:54

N5N3 and others added 3 commits June 18, 2022 15:15

Make for iter::CartesianIndices{1/2} better vectorized.

9a2c7d0

Add more comments.

9c6c02d

Co-Authored-By: Johnny Chen <[email protected]>

Remove invalid state test.

66b8e2d

(These were introduced for performance.)

N5N3 force-pushed the cart-auto-simd branch from db66c7a to 66b8e2d Compare June 18, 2022 07:15

N5N3 closed this Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

Uh oh!

N5N3 commented May 17, 2022

Uh oh!

johnnychen94 May 17, 2022

Uh oh!

N5N3 May 17, 2022 •

edited

Loading

Uh oh!

johnnychen94 May 17, 2022 •

edited

Loading

Uh oh!

N5N3 May 17, 2022 •

edited

Loading

Uh oh!

Uh oh!

johnnychen94 commented May 17, 2022 •

edited

Loading

Uh oh!

N5N3 commented May 18, 2022

Uh oh!

Uh oh!

N5N3 commented Jun 18, 2022 •

edited

Loading

Uh oh!

N5N3 commented Oct 8, 2023

Uh oh!

jonas-schulze commented Oct 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Make for iter::CartesianIndices better vectorized for 1d/2d cases. #45338

Make for iter::CartesianIndices better vectorized for 1d/2d cases. #45338

Uh oh!

Conversation

N5N3 commented May 17, 2022

Uh oh!

johnnychen94 May 17, 2022

Choose a reason for hiding this comment

Uh oh!

N5N3 May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnychen94 May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

N5N3 May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnnychen94 commented May 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N5N3 commented May 18, 2022

Uh oh!

Uh oh!

N5N3 commented Jun 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

N5N3 commented Oct 8, 2023

Uh oh!

jonas-schulze commented Oct 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

Make `for iter::CartesianIndices` better vectorized for 1d/2d cases. #45338

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 May 17, 2022 •

edited

Loading

N5N3 May 17, 2022 •

edited

Loading

johnnychen94 commented May 17, 2022 •

edited

Loading

N5N3 commented Jun 18, 2022 •

edited

Loading