Reorganize the test suite to reduce code duplication #45

DilumAluthge · 2021-01-20T12:43:02Z

No description provided.

DilumAluthge · 2021-01-20T12:47:51Z

@chriselrod This should make it much easier to change the matrix sizes we test. We just edit the contents of https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/dpa/coverage/test/matmul.jl and/or https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/dpa/coverage/test/matmul_coverage.jl.

test/matmul.jl: (code coverage is disabled, so we can test really big matrices)

Octavian.jl/test/matmul.jl

Lines 1 to 7 in 46d21b2

    
           n_values  = [200, 300, 400] 
        
           k_values  = [200, 300, 400] 
        
           m_values  = [200, 300, 400] 
        
           testset_name_suffix = "(main)" 
        
           include("_matmul.jl")

test/matmul_coverage.jl: (code coverage is enabled)

Octavian.jl/test/matmul_coverage.jl

Lines 1 to 7 in 46d21b2

    
           n_values  = [20, 100] 
        
           k_values  = [20, 100] 
        
           m_values  = [20, 100] 
        
           testset_name_suffix = "(coverage)" 
        
           include("_matmul.jl")

codecov · 2021-01-20T12:52:33Z

Codecov Report

Merging #45 (c6574ee) into master (174fe57) will increase coverage by 28.73%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           master      #45       +/-   ##
===========================================
+ Coverage   12.23%   40.97%   +28.73%     
===========================================
  Files          12       12               
  Lines         719      715        -4     
===========================================
+ Hits           88      293      +205     
+ Misses        631      422      -209

Impacted Files	Coverage Δ
src/staticfloats.jl	`36.66% <0.00%> (+3.33%)`	⬆️
src/macrokernels.jl	`15.70% <0.00%> (+9.42%)`	⬆️
src/utils.jl	`79.31% <0.00%> (+10.34%)`	⬆️
src/memory_buffer.jl	`13.79% <0.00%> (+10.34%)`	⬆️
src/global_constants.jl	`85.71% <0.00%> (+14.28%)`	⬆️
src/matmul.jl	`41.97% <0.00%> (+35.09%)`	⬆️
src/integerdivision.jl	`66.66% <0.00%> (+44.44%)`	⬆️
src/funcptrs.jl	`50.00% <0.00%> (+45.34%)`	⬆️
src/types.jl	`75.00% <0.00%> (+75.00%)`	⬆️
src/block_sizes.jl	`77.19% <0.00%> (+77.19%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 174fe57...c6574ee. Read the comment docs.

chriselrod · 2021-01-20T13:00:20Z

Someone else to consider, related to your Preferences.jl PR, is that the L3 cash readings we have are those for large server CPUs, shared among all their cores. But we only have access to 2 or so of those cores, and (similarly) only a correspondingly small slice of the total L3.

Which means our blocking sizes here should be smaller than they are, which would decrease the size of the matrices needed to hit all code paths.

DilumAluthge · 2021-01-20T13:14:34Z

Oh interesting. So the L3 cache size that we get from Hwloc.jl/VectorizationBase.jl is for the entire CPU?

So maybe the logic should be something like this:

Use Hwloc.jl/VectorizationBase.jl to get the size of the L3 cache (shared among all cores)
Use ??? to get the total number of cores on this machine
VectorizationBase.NUM_CORES is the number of cores that we actually have access to (is this correct?)
Divide (VectorizationBase.NUM_CORES)/(answer from number 2) to compute the proportion of cores that we have
Multiple (answer from number 4)*(answer from number 1) to figure out how much L3 cache we actually have access to.

Does this look right?

Couple questions:

Am I right in number 3 when I say that VectorizationBase.NUM_CORES` is the number of cores that we actually have access to (not the total number of cores)
How do we figure out number 2?

chriselrod · 2021-01-20T14:53:49Z

Maybe CpuId.jl can help:

julia> using CpuId

julia> cpucores()
10

Worth trying.

Not in general, but it seems to on GitHub CI.

Example of where it doesn't (VectorizationBase uses Hwloc, but worse than that precompiles one static value; changing it would require re-precompiling VectorizationBase):

# > taskset -c 0,1 julia

julia> using Hwloc
[ Info: Precompiling Hwloc [0e44f5e4-bd66-52a0-8798-143a42290a1d]

julia> Hwloc.topology_load()
D0: L0 P0 Machine
    D1: L0 P0 Package
        D2: L0 P-1 L3Cache  Cache{size=14417920,depth=3,linesize=64,associativity=11,type=Unified}
            D3: L0 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L0 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L0 P0 Core
                        D6: L0 P0 PU
                        D6: L1 P10 PU
            D3: L1 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L1 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L1 P1 Core
                        D6: L2 P1 PU
                        D6: L3 P11 PU
            D3: L2 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L2 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L2 P2 Core
                        D6: L4 P2 PU
                        D6: L5 P12 PU
            D3: L3 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L3 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L3 P3 Core
                        D6: L6 P3 PU
                        D6: L7 P13 PU
            D3: L4 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L4 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L4 P4 Core
                        D6: L8 P4 PU
                        D6: L9 P14 PU
            D3: L5 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L5 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L5 P8 Core
                        D6: L10 P5 PU
                        D6: L11 P15 PU
            D3: L6 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L6 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L6 P9 Core
                        D6: L12 P6 PU
                        D6: L13 P16 PU
            D3: L7 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L7 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L7 P10 Core
                        D6: L14 P7 PU
                        D6: L15 P17 PU
            D3: L8 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L8 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L8 P11 Core
                        D6: L16 P8 PU
                        D6: L17 P18 PU
            D3: L9 P-1 L2Cache  Cache{size=1048576,depth=2,linesize=64,associativity=16,type=Unified}
                D4: L9 P-1 L1Cache  Cache{size=32768,depth=1,linesize=64,associativity=8,type=Data}
                    D5: L9 P12 Core
                        D6: L18 P9 PU
                        D6: L19 P19 PU


julia> run(`taskset -p $(getpid())`)
pid 193828's current affinity mask: 3
Process(`taskset -p 193828`, ProcessExited(0))

We're pinned to cores 0 and 1, but Hwloc still shows 10 cores.

But it's probably correct if we're running in a virtual machine that's only given a few cores.

DilumAluthge added test suite code coverage labels Jan 20, 2021

DilumAluthge requested a review from chriselrod January 20, 2021 12:43

DilumAluthge marked this pull request as ready for review January 20, 2021 13:36

Reorganize the test suite to reduce code duplication

c6574ee

DilumAluthge force-pushed the dpa/coverage branch from da9f5b6 to c6574ee Compare January 20, 2021 13:37

DilumAluthge merged commit 7a66dd4 into master Jan 20, 2021

DilumAluthge deleted the dpa/coverage branch January 20, 2021 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reorganize the test suite to reduce code duplication #45

Reorganize the test suite to reduce code duplication #45

Uh oh!

DilumAluthge commented Jan 20, 2021

Uh oh!

DilumAluthge commented Jan 20, 2021 •

edited

Loading

Uh oh!

codecov bot commented Jan 20, 2021 •

edited

Loading

Uh oh!

chriselrod commented Jan 20, 2021

Uh oh!

DilumAluthge commented Jan 20, 2021

Uh oh!

chriselrod commented Jan 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reorganize the test suite to reduce code duplication #45

Reorganize the test suite to reduce code duplication #45

Uh oh!

Conversation

DilumAluthge commented Jan 20, 2021

Uh oh!

DilumAluthge commented Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

chriselrod commented Jan 20, 2021

Uh oh!

DilumAluthge commented Jan 20, 2021

Uh oh!

chriselrod commented Jan 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DilumAluthge commented Jan 20, 2021 •

edited

Loading

codecov bot commented Jan 20, 2021 •

edited

Loading