Skip to content

Conversation

@jrevels
Copy link
Member

@jrevels jrevels commented Jul 29, 2018

Reviving #27659 because I'd really like to see it in 0.7 if possible.

I redid the measurements from the original PR (#26594):

SLP Disabled

sysimg time:

Sysimage built. Summary:
Total ───────  99.166878 seconds
Base: ───────  26.047203 seconds 26.266%
Stdlibs: ────  73.118250 seconds 73.7325%

StaticArrays test times (I just manually put @time in front of all the relevant includes in StaticArrays' test/runtests.jl file):

@testinf      |    3      3
  1.909640 seconds (6.44 M allocations: 317.704 MiB, 4.13% gc time)
SVector       |   53     53
  1.181462 seconds (2.14 M allocations: 112.380 MiB, 1.80% gc time)
MVector       |   52     52
  0.671240 seconds (656.04 k allocations: 35.176 MiB, 1.41% gc time)
SMatrix       |   63     63
  1.455128 seconds (2.18 M allocations: 109.772 MiB, 1.69% gc time)
MMatrix       |   66     66
  1.115946 seconds (1.45 M allocations: 72.580 MiB, 1.19% gc time)
SArray        |   87     87
  4.106915 seconds (8.05 M allocations: 397.618 MiB, 3.13% gc time)
MArray        |  100    100
  3.734096 seconds (5.51 M allocations: 267.131 MiB, 1.21% gc time)
FieldVector   |   29     29
  0.812507 seconds (1.48 M allocations: 74.886 MiB, 2.14% gc time)
Scalar        |    9      9
  0.520652 seconds (1.70 M allocations: 86.460 MiB, 3.63% gc time)
SUnitRange    |   10     10
  0.096026 seconds (57.95 k allocations: 3.075 MiB)
SizedArray    |   50       1     51
  1.628736 seconds (2.69 M allocations: 135.817 MiB, 1.64% gc time)
SDiagonal     |   74     74
  9.101920 seconds (32.24 M allocations: 1.499 GiB, 5.96% gc time)
Custom types  |    2      2
  0.047831 seconds (44.94 k allocations: 2.494 MiB)
Core definitions and constructors |   79       2     81
  1.447924 seconds (995.12 k allocations: 52.970 MiB, 1.25% gc time)
AbstractArray interface |   50     50
  1.454582 seconds (1.97 M allocations: 96.524 MiB, 1.74% gc time)
Indexing      |   72     72
  2.989505 seconds (4.61 M allocations: 232.423 MiB, 2.41% gc time)
Map, reduce, mapreduce, broadcast |   67     67
  6.372919 seconds (19.25 M allocations: 956.073 MiB, 4.71% gc time)
Array math    |  141    141
  2.316072 seconds (6.94 M allocations: 353.018 MiB, 3.91% gc time)
Broadcast     |   83      12     95
  6.504455 seconds (13.22 M allocations: 691.648 MiB, 3.01% gc time)
Linear algebra |  159    159
 12.165429 seconds (28.08 M allocations: 1.363 GiB, 4.20% gc time)
Matrix multiplication |   66     66
 17.774278 seconds (38.39 M allocations: 1.783 GiB, 4.73% gc time)

SLP Enabled

sysimg time:

Sysimage built. Summary:
Total ───────  89.579786 seconds
Base: ───────  23.448816 seconds 26.1765%
Stdlibs: ────  66.129673 seconds 73.8221%

StaticArrays test times:

@testinf      |    3      3
  1.812174 seconds (6.44 M allocations: 317.704 MiB, 4.21% gc time)
SVector       |   53     53
  1.187204 seconds (2.14 M allocations: 112.379 MiB, 1.88% gc time)
MVector       |   52     52
  0.652456 seconds (656.04 k allocations: 35.176 MiB, 1.45% gc time)
SMatrix       |   63     63
  1.275547 seconds (2.18 M allocations: 109.772 MiB, 1.87% gc time)
MMatrix       |   66     66
  1.118905 seconds (1.45 M allocations: 72.611 MiB, 1.19% gc time)
SArray        |   87     87
  4.486199 seconds (8.05 M allocations: 397.271 MiB, 2.88% gc time)
MArray        |  100    100
  4.108803 seconds (5.51 M allocations: 267.131 MiB, 1.15% gc time)
FieldVector   |   29     29
  0.859377 seconds (1.48 M allocations: 74.886 MiB, 2.12% gc time)
Scalar        |    9      9
  0.562872 seconds (1.70 M allocations: 86.463 MiB, 3.37% gc time)
SUnitRange    |   10     10
  0.100489 seconds (57.95 k allocations: 3.075 MiB)
SizedArray    |   50       1     51
  1.745068 seconds (2.69 M allocations: 135.817 MiB, 1.62% gc time)
SDiagonal     |   74     74
  9.985475 seconds (32.24 M allocations: 1.499 GiB, 6.32% gc time)
Custom types  |    2      2
  0.055124 seconds (44.94 k allocations: 2.494 MiB)
Core definitions and constructors |   79       2     81
  1.575715 seconds (995.11 k allocations: 52.969 MiB, 1.55% gc time)
AbstractArray interface |   50     50
  1.553793 seconds (1.97 M allocations: 96.518 MiB, 1.98% gc time)
Indexing      |   72     72
  3.239480 seconds (4.61 M allocations: 232.406 MiB, 2.56% gc time)
Map, reduce, mapreduce, broadcast |   67     67
  7.154500 seconds (19.25 M allocations: 956.065 MiB, 5.53% gc time)
Array math    |  141    141
  2.557693 seconds (6.94 M allocations: 352.989 MiB, 3.65% gc time)
Broadcast     |   83      12     95
  7.150276 seconds (13.22 M allocations: 691.824 MiB, 2.88% gc time)
Linear algebra |  159    159
 13.607684 seconds (28.08 M allocations: 1.363 GiB, 4.11% gc time)
Matrix multiplication |   66     66
 20.916517 seconds (38.39 M allocations: 1.783 GiB, 4.47% gc time)

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@jrevels
Copy link
Member Author

jrevels commented Jul 30, 2018

I'll investigate the perf regressions and report back. It's totally possible they're real; if so, do we still want to merge this?

@andreasnoack I'm seeing a LinearAlgebra-y error on windows in this PR, I think it's unrelated but just want to check with you.

@andreasnoack
Copy link
Member

The test error looks like an unlucky draw. Let's see if #28355 can get rid of it.

@Keno
Copy link
Member

Keno commented Jul 30, 2018

The skipmissing ones look somewhat interesting. Most of the other ones are among the usual candidates for flakyness.

@jrevels
Copy link
Member Author

jrevels commented Jul 30, 2018

I am seeing some reproducible regressions here, but they're smaller on my machine than reported by nanosoldier (max ~10% in the skipmissing case).

That still seems worth merging to me; I'll do so tomorrow morning unless there are any objections.

@Keno
Copy link
Member

Keno commented Jul 30, 2018

I am seeing some reproducible regressions here, but they're smaller on my machine than reported by nanosoldier (max ~10% in the skipmissing case).

It's quite possible that this varies over architectural features. What architecture did you test with?

@jrevels
Copy link
Member Author

jrevels commented Jul 30, 2018

Platform Info:
  OS: macOS (x86_64-apple-darwin17.5.0)
  CPU: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

@jrevels
Copy link
Member Author

jrevels commented Jul 31, 2018

Shall I merge this? Not sure that the (potential) regressions on nanosoldier are worth holding back here, though if others want to test more they can

@Keno
Copy link
Member

Keno commented Jul 31, 2018

I'm ok with this, but rc1 just went, so this'll be in rc2. Hold for a few hours since we'll be branching release-0.7 shortly.

@jrevels
Copy link
Member Author

jrevels commented Jul 31, 2018

Hold for a few hours since we'll be branching release-0.7 shortly.

Okay, feel free to merge whenever. Otherwise I'll just merge tomorrow.

@Keno Keno merged commit 0ef8826 into master Aug 1, 2018
@ararslan ararslan deleted the jr/reviveslpvect branch August 1, 2018 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants