-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Make for iter::CartesianIndices better vectorized for 1d/2d cases.
#45338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| rng = indices[1] | ||
| I = state[1] + step(rng) | ||
| valid = __is_valid_range(I, rng) && state[1] != last(rng) | ||
| if N == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I get the idea -- when will N != 1 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
N is the dimension of the CartesianIndices:
- If
N > 1, the outermost dimension uses__is_valid_rangeto preserve the performance improvement introduced in add StepRange support for CartesianIndices #37829. - If
N == 1, just usestate[1] != last(rng), as__is_valid_rangepervents vectorization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But by calling __inc(state.I, iter.indices, Val(ndims(iter))) as in you did in R404, because of the type annotion state::Tuple{Int}, indices::Tuple{OrdinalRangeInt}, this method R420 would only be called when length(iter.indices) == ndims(iter) == 1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also called at R435 as the last input ndim is passed deeper without any change.
BTW, all the test failures should be invalid state. Since we have
julia> iterate(1:2:typemax(Int), typemax(Int)-1)
(-9223372036854775808, -9223372036854775808)I think it's OK to replace them.
|
Some local "index" => 7-element BenchmarkTools.BenchmarkGroup:
tags: ["sum", "simd"]
("sumeach", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-85.57% => improvement)
("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{UnitRange{Int64}, UnitRange{Int64}}, false}") => TrialJudgement(-87.51% => improvement)
("sumcartesian", "1:100000") => TrialJudgement(-100.00% => improvement)
("sumeach", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.62% => improvement)
("sumcartesian", "SubArray{Int32, 2, BaseBenchmarks.ArrayBenchmarks.ArrayLS{Int32, 2}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, false}") => TrialJudgement(-87.48% => improvement)
("sumcartesian", "SubArray{Int32, 2, Matrix{Int32}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}}, true}") => TrialJudgement(-84.04% => improvement)
("sumcartesian", "100000:-1:1") => TrialJudgement(-100.00% => improvement) |
Co-Authored-By: Johnny Chen <[email protected]>
(These were introduced for performance.)
|
With today's master (LLVM14), this PR also helps vectorizing 3d |
|
No need after #51606. |
|
It's a bit sad that (subjectively) so many PRs get forgotten about for so long. |
Local benchmark shows that this makes non-
@simd1d/2d loop faster.on master:
This PR