-
Notifications
You must be signed in to change notification settings - Fork 18
Reorganize the test suite to reduce code duplication #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@chriselrod This should make it much easier to change the matrix sizes we test. We just edit the contents of https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/dpa/coverage/test/matmul.jl and/or https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/dpa/coverage/test/matmul_coverage.jl.
Lines 1 to 7 in 46d21b2
Octavian.jl/test/matmul_coverage.jl Lines 1 to 7 in 46d21b2
|
Codecov Report
@@ Coverage Diff @@
## master #45 +/- ##
===========================================
+ Coverage 12.23% 40.97% +28.73%
===========================================
Files 12 12
Lines 719 715 -4
===========================================
+ Hits 88 293 +205
+ Misses 631 422 -209
Continue to review full report at Codecov.
|
|
Someone else to consider, related to your Preferences.jl PR, is that the L3 cash readings we have are those for large server CPUs, shared among all their cores. But we only have access to 2 or so of those cores, and (similarly) only a correspondingly small slice of the total L3. Which means our blocking sizes here should be smaller than they are, which would decrease the size of the matrices needed to hit all code paths. |
|
Oh interesting. So the L3 cache size that we get from Hwloc.jl/VectorizationBase.jl is for the entire CPU? So maybe the logic should be something like this:
Does this look right? Couple questions:
|
da9f5b6 to
c6574ee
Compare
julia> using CpuId
julia> cpucores()
10Worth trying.
Example of where it doesn't (VectorizationBase uses Hwloc, but worse than that precompiles one static value; changing it would require re-precompiling VectorizationBase): # > taskset -c 0,1 julia
julia> using Hwloc
[ Info: Precompiling Hwloc [0e44f5e4-bd66-52a0-8798-143a42290a1d]
julia> Hwloc.topology_load()
D0: L0 P0 Machine
D1: L0 P0 Package
D2: L0 P-1 L3Cache Cache{size=14417920,depth=3,linesize=64,associativity=11,type=Unified}
D3: L0 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L0 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L0 P0 Core
D6: L0 P0 PU
D6: L1 P10 PU
D3: L1 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L1 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L1 P1 Core
D6: L2 P1 PU
D6: L3 P11 PU
D3: L2 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L2 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L2 P2 Core
D6: L4 P2 PU
D6: L5 P12 PU
D3: L3 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L3 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L3 P3 Core
D6: L6 P3 PU
D6: L7 P13 PU
D3: L4 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L4 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L4 P4 Core
D6: L8 P4 PU
D6: L9 P14 PU
D3: L5 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L5 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L5 P8 Core
D6: L10 P5 PU
D6: L11 P15 PU
D3: L6 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L6 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L6 P9 Core
D6: L12 P6 PU
D6: L13 P16 PU
D3: L7 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L7 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L7 P10 Core
D6: L14 P7 PU
D6: L15 P17 PU
D3: L8 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L8 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L8 P11 Core
D6: L16 P8 PU
D6: L17 P18 PU
D3: L9 P-1 L2Cache Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
D4: L9 P-1 L1Cache Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
D5: L9 P12 Core
D6: L18 P9 PU
D6: L19 P19 PU
julia> run(`taskset -p $(getpid())`)
pid 193828's current affinity mask: 3
Process(`taskset -p 193828`, ProcessExited(0))We're pinned to cores 0 and 1, but Hwloc still shows 10 cores. But it's probably correct if we're running in a virtual machine that's only given a few cores. |
No description provided.