Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Apr 27, 2017

I happened to notice that extrapolation is quite a lot slower than interpolation:

julia> using BenchmarkTools, Interpolations

julia> img = rand(3,3,3);

julia> itp = interpolate(img, BSpline(Linear()), OnGrid());

julia> etp = extrapolate(itp, Flat());

julia> @benchmark $itp[2.2,2.2,2.2] seconds=1
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     20.111 ns (0.00% GC)
  median time:      20.616 ns (0.00% GC)
  mean time:        20.820 ns (0.00% GC)
  maximum time:     38.106 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark $etp[2.2,2.2,2.2] seconds=1
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     63.526 ns (0.00% GC)
  median time:      63.929 ns (0.00% GC)
  mean time:        66.641 ns (1.99% GC)
  maximum time:     2.312 μs (96.59% GC)
  --------------
  samples:          10000
  evals/sample:     980

This slowness doesn't seem inevitable, because the amount of "work" (computation) performed by extrapolation is quite minor compared to the work done by interpolation:

julia> Interpolations.getindex_impl(typeof(etp))
quote  # /home/tim/.julia/v0.5/Interpolations/src/extrapolation/extrapolation.jl, line 53:
    @nexprs 3 (d->begin  # /home/tim/.julia/v0.5/Interpolations/src/extrapolation/extrapolation.jl, line 53:
                xs_d = xs[d]
            end) # /home/tim/.julia/v0.5/Interpolations/src/extrapolation/extrapolation.jl, line 54:
    begin 
        xs_1 = clamp(xs_1,lbound(etp,1),ubound(etp,1))
        xs_2 = clamp(xs_2,lbound(etp,2),ubound(etp,2))
        xs_3 = clamp(xs_3,lbound(etp,3),ubound(etp,3))
    end # /home/tim/.julia/v0.5/Interpolations/src/extrapolation/extrapolation.jl, line 55:
    etp.itp[xs_1,xs_2,xs_3]
end

Interpolation also involves a whole second round of clamps and throws in a bunch of floor/round and arithmetic operations to boot (16 multiplies and 16 adds for linear interpolation in 3 dimensions). So one might suspect this could be fixed through a careful look at the generated code.

The first commit here is almost trivial: it adds forced-inlining to circumvent the splatting penalty. This leads to an approximately 20ns reduction in this test case. The second commit is more complicated; size and indices, when called with a dimension argument, involve a branch that looks something like this:

size(A, d) = d <= ndims(A) ? size(A)[d] : 1

Our generated code used these dimension-specific constructs and triggered two branches per dimension (one for lbound and one for ubound), resulting in 6 branches total for this example. Note that such branches are not present if you start from sz = size(A), so I changed our generated code to use this style (with indices rather than size). This gave another approximate 20ns boost, so that the result with this PR is

julia> @benchmark $etp[2.2,2.2,2.2] seconds=1
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     26.879 ns (0.00% GC)
  median time:      27.188 ns (0.00% GC)
  mean time:        27.413 ns (0.00% GC)
  maximum time:     46.544 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     995

which is a much more reasonable overhead compared to interpolation.

@sglyon
Copy link
Member

sglyon commented Apr 27, 2017

This looks fantastic. Great optimizing work.

I don't have time right now, but I will take a closer look at this before the end of the week (probably Friday).

If others look/review this and approve the PR, feel free to merge without me.

Copy link
Member

@sglyon sglyon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@timholy timholy merged commit 70e4128 into master Apr 28, 2017
@timholy timholy deleted the teh/faster_etp branch April 28, 2017 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants