improve performance with ties by palday · Pull Request #74 · JuliaStats/Loess.jl

palday · 2023-08-13T20:23:58Z

Building on #73 (comment), ~~I tried doing an initial equality check to short-circuit the linear search. For small n, this seems to improve performance (data from #73):~~ and further discussion in this PR, I tried just pre-sorting all values ahead of time. If we assume that sort is O(n log n) (and with radix sort, which I believe is the default for floats in recent releases, that's a loose upper bound!), then the sort penalty for the entire array isn't that high. I think part of the motivation for the original piecewise partialsort! was that n_piecewise << n_total and so with a bunch of runs, you still have better performance. That didn't turn out to be the case. I don't know if this is due to partialsort! using a lower performing algorithm or just the need to pass through the array multiple times and thus doing multiple sorting passes or some mixture of the two. It doesn't matter. Pre-sorting seems to greatly speed things up for datasets with ties. I also benchmark random data (so hopefully very few ties) and saw no performance penalty for this approach.

Setup

all done in a clean temporary environment

using Arrow
using BenchmarkTools
using Loess
using Random

tbl = Arrow.Table("loess.arrow")

Plotting the big dataset

using AlgebraOfGraphics
using CairoMakie
tbl = Arrow.Table("loess.arrow")
plt = data(tbl) * mapping(:x, :y) * smooth()
save("loess.png", draw(plt))

0.5.4

this PR

Plotting the sinusoid with lots of ties

using AlgebraOfGraphics
using CairoMakie
using Loess

x = repeat([π/4*i for i in -20:20], inner=101)
y = sin.(x)
model = loess(x,y; span=0.2)

let fig = Figure(), x = unique(x)
    ax = Axis(fig[1, 1])
    predx = range(minimum(x), stop = maximum(x), length = 500)
    scatter!(ax, predx, predict(model, predx); label="fitted")
    scatter!(ax, x, sin.(x); label="observed")
    axislegend(ax)
    save("sine_0.6.1.png", fig)
    fig
end

0.5.4

0.6.1

this PR

version info

julia> versioninfo()
Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 1 on 16 virtual cores

Benchmarks on 1000 elements

0.5.4

Details

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 3056 samples with 1 evaluation.
 Range (min … max):  1.458 ms …   6.231 ms  ┊ GC (min … max): 0.00% … 72.58%
 Time  (median):     1.531 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.634 ms ± 586.090 μs  ┊ GC (mean ± σ):  5.93% ± 11.27%

  ██▃▁▁                                                       ▁
  █████▄▅▆▅▆▅▄▄▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██▇ █
  1.46 ms      Histogram: log(frequency) by time         5 ms <

 Memory estimate: 2.05 MiB, allocs estimate: 59301.

v0.6.1

Details

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 102 samples with 1 evaluation.
 Range (min … max):  48.012 ms …  56.795 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     49.252 ms               ┊ GC (median):    1.50%
 Time  (mean ± σ):   49.401 ms ± 951.264 μs  ┊ GC (mean ± σ):  1.36% ± 0.56%

             ▂      ▄  ▄▂█  ▂▂▄ ▂▂                              
  ▄▁▁▁▄▄▄▁▄▁▁█▆▆███▄██▆███▆▁███▆██▄▆▄▁▄▆▄▁▆▁▆█▄▁█▄▆▁█▁▁▁▁▄▁▁▁▄ ▄
  48 ms           Histogram: frequency by time         50.9 ms <

 Memory estimate: 38.05 MiB, allocs estimate: 2340200.

this PR

Details

julia> n = 1000; @benchmark loess($(first(tbl.x, n)), $(first(tbl.y, n)))
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  355.205 μs …   2.749 ms  ┊ GC (min … max): 0.00% … 83.59%
 Time  (median):     368.688 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   383.268 μs ± 158.739 μs  ┊ GC (mean ± σ):  2.98% ±  6.12%

     ▁▃▇██▇▇██▆▄▃▂▂                  ▁▁▁▁                       ▂
  ▇█████████████████▇▇▇█████▇▇▇▇▇█████████▇███▇▆▆▄▄▅▄▄▄▄▄▄▂▅▃▄▅ █
  355 μs        Histogram: log(frequency) by time        440 μs <

 Memory estimate: 412.83 KiB, allocs estimate: 453.

Entire dataset

0.5.4

Details

julia> ll = @time loess(tbl.x, tbl.y);
  6.950204 seconds (218.62 M allocations: 6.195 GiB, 7.18% gc time)

julia> sort!(collect(ll.kdtree.verts))
12-element Vector{Vector{Float64}}:
 [1.0]
 [4.0]
 [5.0]
 [6.0]
 [7.0]
 [8.0]
 [9.0]
 [10.0]
 [11.0]
 [12.0]
 [13.0]
 [21.0]

0.6.1

N/A

this PR

Details

julia> ll = @time loess(tbl.x, tbl.y);
  2.995456 seconds (120.77 k allocations: 1.184 GiB, 1.97% gc time, 3.45% compilation time)

julia> sort!(collect(ll.kdtree.verts))
10-element Vector{Vector{Float64}}:
 [1.0]
 [4.0]
 [5.0]
 [6.0]
 [7.0]
 [8.0]
 [9.0]
 [10.0]
 [11.0]
 [21.0]

Benchmarks on random data

(i.e. without many ties)

0.5.4

Details

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end

┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  320.761 μs …   8.552 ms  ┊ GC (min … max): 0.00% … 94.90%
 Time  (median):     334.907 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   389.049 μs ± 474.765 μs  ┊ GC (mean ± σ):  7.90% ±  6.17%

  ▁▃▅██▇▆▅▃▂▁▁     ▁ ▁▁                    ▂▃▅▅▃▂▂▂             ▂
  █████████████▇█████████▇██▇▇▆▅▅▆▄▅▅▅▅▆▆▆█████████▇▆▅▆▅▅▅▅▇▆▆▆ █
  321 μs        Histogram: log(frequency) by time        490 μs <

 Memory estimate: 519.67 KiB, allocs estimate: 7455.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 1846 samples with 1 evaluation.
 Range (min … max):  2.390 ms … 10.500 ms  ┊ GC (min … max): 0.00% … 69.27%
 Time  (median):     2.508 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.706 ms ±  1.032 ms  ┊ GC (mean ± σ):  7.04% ± 12.38%

  ▇█▂                                                      ▁  
  ████▆▁▇▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁██ █
  2.39 ms      Histogram: log(frequency) by time      8.1 ms <

 Memory estimate: 4.15 MiB, allocs estimate: 80600.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 193 samples with 1 evaluation.
 Range (min … max):  23.413 ms … 34.036 ms  ┊ GC (min … max): 0.00% … 15.06%
 Time  (median):     24.412 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   26.005 ms ±  2.485 ms  ┊ GC (mean ± σ):  6.96% ±  8.32%

     ▃▂▆█▁▅                                     ▂              
  ▃▄▅███████▅▁▁▃▃▃▃▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇██▅▆▅▃▆▃▅▃▃▁▃ ▃
  23.4 ms         Histogram: frequency by time        30.3 ms <

 Memory estimate: 42.53 MiB, allocs estimate: 941293.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 16 samples with 1 evaluation.
 Range (min … max):  308.744 ms … 326.230 ms  ┊ GC (min … max): 9.58% … 8.12%
 Time  (median):     318.931 ms               ┊ GC (median):    9.43%
 Time  (mean ± σ):   318.033 ms ±   5.482 ms  ┊ GC (mean ± σ):  9.40% ± 1.12%

  ▁   ▁ ▁  ▁             ▁   ▁  ▁    █  ▁  ▁      ▁▁▁    ▁    ▁  
  █▁▁▁█▁█▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█▁▁█▁▁▁▁█▁▁█▁▁█▁▁▁▁▁▁███▁▁▁▁█▁▁▁▁█ ▁
  309 ms           Histogram: frequency by time          326 ms <

 Memory estimate: 430.37 MiB, allocs estimate: 9810248.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 2 samples with 1 evaluation.
 Range (min … max):  3.572 s …    3.817 s  ┊ GC (min … max): 8.26% … 7.90%
 Time  (median):     3.694 s               ┊ GC (median):    8.08%
 Time  (mean ± σ):   3.694 s ± 173.271 ms  ┊ GC (mean ± σ):  8.08% ± 0.26%

  █                                                        █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  3.57 s         Histogram: frequency by time         3.82 s <

 Memory estimate: 4.25 GiB, allocs estimate: 101252486.

0.6.1

Details

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end
┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  115.501 μs …  2.125 ms  ┊ GC (min … max): 0.00% … 93.09%
 Time  (median):     117.672 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   124.172 μs ± 86.753 μs  ┊ GC (mean ± σ):  3.42% ±  4.59%

   ▆█▇▅▂▁                                                      ▁
  ████████▇▇▆▆▄▄▄▄▄▄▅▅▅▅▆▆▄▄▇▇█▇████▇▇▇▇▇▆▄▄▄▃▄▃▄▃▅▃▅▅▄▄▃▃▄▅▄▅ █
  116 μs        Histogram: log(frequency) by time       162 μs <

 Memory estimate: 114.06 KiB, allocs estimate: 2478.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 4777 samples with 1 evaluation.
 Range (min … max):  978.490 μs …   2.443 ms  ┊ GC (min … max): 0.00% … 55.42%
 Time  (median):       1.007 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.044 ms ± 179.361 μs  ┊ GC (mean ± σ):  2.32% ±  7.44%

  ▄█▅▄▁▁                                                        ▁
  ████████▇▇█▅▆▁▄▁▄▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇█ █
  978 μs        Histogram: log(frequency) by time       2.25 ms <

 Memory estimate: 918.16 KiB, allocs estimate: 29710.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 467 samples with 1 evaluation.
 Range (min … max):  10.211 ms …  12.664 ms  ┊ GC (min … max): 0.00% … 7.23%
 Time  (median):     10.558 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.723 ms ± 415.142 μs  ┊ GC (mean ± σ):  1.87% ± 3.34%

        ▂▂ ▄▇▂▄▃▁▃█                                             
  ▂▂▁▃▄▇████████████▅▅▅▄▃▃▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▂▁▁▄▄▆▃▃▃▄▅▄▅▅▅▄▄▃▄▄ ▃
  10.2 ms         Histogram: frequency by time         11.6 ms <

 Memory estimate: 9.52 MiB, allocs estimate: 353432.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 39 samples with 1 evaluation.
 Range (min … max):  124.520 ms … 142.043 ms  ┊ GC (min … max): 1.45% … 2.12%
 Time  (median):     129.131 ms               ┊ GC (median):    2.15%
 Time  (mean ± σ):   129.727 ms ±   3.977 ms  ┊ GC (mean ± σ):  2.09% ± 0.39%

  ▁  ▁ █      ▁ ▁    ▁▁  ▄            ▁                          
  █▆▆█▆█▆▆▁▁▁▆█▁█▆▆▁▆██▁▆█▁▁▁▆▆▁▁▆▆▆▁▆█▁▁▁▁▁▁▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆ ▁
  125 ms           Histogram: frequency by time          142 ms <

 Memory estimate: 97.23 MiB, allocs estimate: 3682381.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 3 samples with 1 evaluation.
 Range (min … max):  1.882 s …   1.916 s  ┊ GC (min … max): 1.03% … 1.62%
 Time  (median):     1.891 s              ┊ GC (median):    1.58%
 Time  (mean ± σ):   1.896 s ± 17.801 ms  ┊ GC (mean ± σ):  1.41% ± 0.33%

  █             █                                         █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.88 s         Histogram: frequency by time        1.92 s <

 Memory estimate: 1021.80 MiB, allocs estimate: 40087669.

this PR

Details

julia> for i in 2:6
           n = 10^i
           x = rand(MersenneTwister(42), n)
           y = sqrt.(x)
           b = @benchmark loess($x, $y)
           @info "" n
           display(b)
       end
┌ Info: 
└   n = 100
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  62.760 μs …   5.021 ms  ┊ GC (min … max): 0.00% … 97.23%
 Time  (median):     65.091 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   73.761 μs ± 129.748 μs  ┊ GC (mean ± σ):  4.92% ±  2.76%

  ▂▇█▇▆▄▂▁             ▂▄▅▅▅▅▄▃▂▂▁▁                            ▂
  █████████▇▆▇▇▇▇▇▇▇▆▇███████████████▇▇▆▆▆▆▇▅▅▆▅▅▆▅▅▄▄▆▅▅▅▅▄▂▆ █
  62.8 μs       Histogram: log(frequency) by time      98.7 μs <

 Memory estimate: 79.91 KiB, allocs estimate: 467.
┌ Info: 
└   n = 1000
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  449.035 μs …   3.317 ms  ┊ GC (min … max): 0.00% … 82.10%
 Time  (median):     467.707 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   482.946 μs ± 176.674 μs  ┊ GC (mean ± σ):  2.42% ±  5.62%

   ▃▂▂▂▆█▇▆▅▆▇▆▆▅▄▃▂▁▁▁   ▁▁▁▁▁▁                                ▂
  █████████████████████████████████▇▆▆▆▆▅▅▅▅▅▆▅▄▄▅▅▄▅▄▃▄▃▂▄▂▃▂▂ █
  449 μs        Histogram: log(frequency) by time        564 μs <

 Memory estimate: 437.19 KiB, allocs estimate: 477.
┌ Info: 
└   n = 10000
BenchmarkTools.Trial: 917 samples with 1 evaluation.
 Range (min … max):  5.181 ms …   9.019 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.351 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.452 ms ± 295.416 μs  ┊ GC (mean ± σ):  0.84% ± 3.41%

       ▁█▅ ▁                                                   
  ▃▄▇▇▇█████▇▄▃▃▃▃▃▄▄▃▄▃▄▄▄▃▃▂▂▁▁▂▂▁▂▂▂▁▁▂▂▁▁▁▁▂▁▁▁▁▂▁▂▂▂▂▃▃▂ ▃
  5.18 ms         Histogram: frequency by time        6.46 ms <

 Memory estimate: 3.91 MiB, allocs estimate: 510.
┌ Info: 
└   n = 100000
BenchmarkTools.Trial: 63 samples with 1 evaluation.
 Range (min … max):  72.733 ms … 93.711 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     79.146 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   80.258 ms ±  4.298 ms  ┊ GC (mean ± σ):  0.52% ± 0.61%

               █▂▄▄                                            
  ▄▄▄▄▁▁▁▁▁▄▆▄▄████▄█▆▆▆▆▁▄█▄▆▄▆▁▄▁▄▄▆▄▁▄▁▁▁▁▁▁▁▁▄▁▄▁▁▁▁▁▁▁▁▄ ▁
  72.7 ms         Histogram: frequency by time        93.5 ms <

 Memory estimate: 38.76 MiB, allocs estimate: 510.
┌ Info: 
└   n = 1000000
BenchmarkTools.Trial: 4 samples with 1 evaluation.
 Range (min … max):  1.305 s …   1.370 s  ┊ GC (min … max): 0.22% … 0.37%
 Time  (median):     1.340 s              ┊ GC (median):    0.26%
 Time  (mean ± σ):   1.338 s ± 30.105 ms  ┊ GC (mean ± σ):  0.26% ± 0.10%

  █             █                               █         █  
  █▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁█ ▁
  1.3 s          Histogram: frequency by time        1.37 s <

codecov-commenter · 2023-08-13T20:27:00Z

Codecov Report

Patch coverage: 54.83% and project coverage change: -6.21% ⚠️

Comparison is base (5126a74) 91.98% compared to head (35202d2) 85.77%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #74      +/-   ##
==========================================
- Coverage   91.98%   85.77%   -6.21%     
==========================================
  Files           2        2              
  Lines         212      218       +6     
==========================================
- Hits          195      187       -8     
- Misses         17       31      +14

Files Changed	Coverage Δ
src/kd.jl	`83.48% <54.83%> (-12.64%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/kd.jl

palday · 2023-08-15T22:28:04Z

@andreasnoack playing around a bit I discovered that things are dramatically faster if you presort instead of repeatedly calling partialsort!. Doing this, I can leave the original linear search in place (for now) and still see a dramatic performance increase. I'll update the benchmarks later, but it passes the sinusoid example you proposed and gives a result much closer to 0.5.4 in the plot.

andreasnoack · 2023-08-21T18:27:57Z

When you timed, did you compare to current master or #76?

palday · 2023-08-21T18:29:20Z

When you timed, did you compare to current master or #76?

The released versions -- I did add Loess@version

…_circuit_small_number_of_ties

github-actions · 2023-08-22T21:07:20Z

Benchmark Report for /home/runner/work/Loess.jl/Loess.jl

Job Properties

Time of benchmarks:
- Target: 23 Aug 2023 - 20:43
- Baseline: 23 Aug 2023 - 20:44
Package commits:
- Target: a03a45
- Baseline: 5126a7
Julia commits:
- Target: e4ee48
- Baseline: e4ee48
Julia command flags:
- Target: None
- Baseline: -Cnative,-J/opt/hostedtoolcache/julia/1.9.2/x64/lib/julia/sys.so,-g1,-O3,-e,using Pkg; Pkg.update()
Environment variables:
- Target: None
- Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["random", "100"]`	0.52 (5%) ✅	0.95 (1%) ✅
`["random", "1000"]`	0.44 (5%) ✅	0.68 (1%) ✅
`["random", "10000"]`	0.49 (5%) ✅	0.44 (1%) ✅
`["random", "100000"]`	0.53 (5%) ✅	0.40 (1%) ✅
`["random", "1000000"]`	0.57 (5%) ✅	0.38 (1%) ✅
`["ties", "sine"]`	0.01 (5%) ✅	0.02 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["random"]
["ties"]

Julia versioninfo

Target

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.3 LTS
  uname: Linux 5.15.0-1041-azure #48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1084 s          0 s        100 s       1112 s          0 s
       #2  2593 MHz        504 s          0 s        120 s       1654 s          0 s
  Memory: 6.7694854736328125 GB (5095.03515625 MB free)
  Uptime: 234.43 sec
  Load Avg:  1.0  0.69  0.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores

Baseline

Julia Version 1.9.2
Commit e4ee485e909 (2023-07-05 09:39 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 22.04.3 LTS
  uname: Linux 5.15.0-1041-azure #48-Ubuntu SMP Tue Jun 20 20:34:08 UTC 2023 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1442 s          0 s        121 s       1517 s          0 s
       #2  2593 MHz        944 s          0 s        139 s       1979 s          0 s
  Memory: 6.7694854736328125 GB (5510.23828125 MB free)
  Uptime: 312.99 sec
  Load Avg:  1.05  0.8  0.38
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake-avx512)
  Threads: 1 on 2 virtual cores

andreasnoack

Ran the timings in #76 (comment). Looks like the two versions are comparable in the random case but that this PR is faster when there are ties so let's go with this one. Might be worth leaving a comment in the code with a link to this issue in case somebody in the future compares the implementation to the paper and thinks there is a potential for speedup via partialsort!.

andreasnoack · 2023-08-23T20:13:08Z

src/kd.jl

-    mid = (length(perm) + 1) ÷ 2
-    @debug "Candidate median index and median value" mid xs[perm[mid], j]
+    mid = (length(xjs) + 1) ÷ 2
+    @debug "Candidate median index and median value" mid xs[mid, j]


Should this be xjs[mid] instead of xs[mid, j] and likewise below?

good catch, yes

palday · 2023-08-23T20:41:51Z

Ran the timings in #76 (comment). Looks like the two versions are comparable in the random case but that this PR is faster when there are ties so let's go with this one. Might be worth leaving a comment in the code with a link to this issue in case somebody in the future compares the implementation to the paper and thinks there is a potential for speedup via partialsort!.

done!

palday · 2023-08-23T20:42:22Z

@andreasnoack if you have no further concerns, I'll merge this in about an hour

palday added 2 commits August 13, 2023 15:14

check for all equal in small chunks

6cf639a

patch bump

6adfa0d

palday requested a review from andreasnoack August 13, 2023 20:27

andreasnoack reviewed Aug 14, 2023

View reviewed changes

src/kd.jl Outdated Show resolved Hide resolved

palday added 5 commits August 15, 2023 14:00

actually split on ties

3ce8203

tests

a1d1823

pre sort

4e8396d

clean up a little

1d9d505

cleanup

9705975

palday requested a review from andreasnoack August 16, 2023 03:45

palday added 3 commits August 22, 2023 10:51

Merge branch 'master' of github.com:JuliaStats/Loess.jl into pa/short…

dafc284

…_circuit_small_number_of_ties

Merge branch 'master' of github.com:JuliaStats/Loess.jl into pa/short…

ebb7569

…_circuit_small_number_of_ties

Merge branch 'master' of github.com:JuliaStats/Loess.jl into pa/short…

35202d2

…_circuit_small_number_of_ties

andreasnoack reviewed Aug 23, 2023

View reviewed changes

palday added 2 commits August 23, 2023 15:38

add comment about performance linking to PR

842fc5b

consistency and correctness in debug logging

d19f08d

palday merged commit 33a4cd3 into master Aug 23, 2023

palday deleted the pa/short_circuit_small_number_of_ties branch August 23, 2023 21:44

andreasnoack mentioned this pull request Aug 25, 2023

Add let block around call to partialsort! to avoid overhead from captured variable #76

Closed

andreasnoack mentioned this pull request Aug 3, 2025

Loess 0.6 fails (never completes) while Loess 0.5.4 completes in a few seconds #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve performance with ties#74

improve performance with ties#74
palday merged 12 commits intomasterfrom
pa/short_circuit_small_number_of_ties

palday commented Aug 13, 2023 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 13, 2023 •

edited

Loading

Uh oh!

Uh oh!

palday commented Aug 15, 2023

Uh oh!

andreasnoack commented Aug 21, 2023

Uh oh!

palday commented Aug 21, 2023

Uh oh!

github-actions bot commented Aug 22, 2023 •

edited

Loading

Uh oh!

andreasnoack left a comment

Uh oh!

andreasnoack Aug 23, 2023

Uh oh!

palday Aug 23, 2023

Uh oh!

palday commented Aug 23, 2023

Uh oh!

palday commented Aug 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

palday commented Aug 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Setup

Plotting the big dataset

Plotting the sinusoid with lots of ties

version info

Benchmarks on 1000 elements

0.5.4

v0.6.1

this PR

Entire dataset

0.5.4

0.6.1

this PR

Benchmarks on random data

0.5.4

0.6.1

this PR

Uh oh!

codecov-commenter commented Aug 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

palday commented Aug 15, 2023

Uh oh!

andreasnoack commented Aug 21, 2023

Uh oh!

palday commented Aug 21, 2023

Uh oh!

github-actions bot commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Report for /home/runner/work/Loess.jl/Loess.jl

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

Uh oh!

andreasnoack left a comment

Choose a reason for hiding this comment

Uh oh!

andreasnoack Aug 23, 2023

Choose a reason for hiding this comment

Uh oh!

palday Aug 23, 2023

Choose a reason for hiding this comment

Uh oh!

palday commented Aug 23, 2023

Uh oh!

palday commented Aug 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

palday commented Aug 13, 2023 •

edited

Loading

codecov-commenter commented Aug 13, 2023 •

edited

Loading

github-actions bot commented Aug 22, 2023 •

edited

Loading