Skip to content

Conversation

@klauspost
Copy link
Owner

@klauspost klauspost commented Nov 15, 2022

Master/This on Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz:

$go test -test.run=None -bench=Parallel_8x8x05M -cpu=1,2,4,8,16,32,64,128
...
benchmark                         old ns/op     new ns/op     delta
BenchmarkParallel_8x8x05M         1101343       408522        -62.91%
BenchmarkParallel_8x8x05M-2       541933        197688        -63.52%
BenchmarkParallel_8x8x05M-4       278581        93755         -66.35%
BenchmarkParallel_8x8x05M-8       138552        57332         -58.62%
BenchmarkParallel_8x8x05M-16      73836         41346         -44.00%
BenchmarkParallel_8x8x05M-32      69233         35899         -48.15%
BenchmarkParallel_8x8x05M-64      89550         38715         -56.77%
BenchmarkParallel_8x8x05M-128     96317         48993         -49.13%

benchmark                         old MB/s      new MB/s      speedup
BenchmarkParallel_8x8x05M         7616.70       20534.04      2.70x
BenchmarkParallel_8x8x05M-2       15479.04      42433.58      2.74x
BenchmarkParallel_8x8x05M-4       30111.92      89473.49      2.97x
BenchmarkParallel_8x8x05M-8       60544.66      146315.58     2.42x
BenchmarkParallel_8x8x05M-16      113610.76     202887.60     1.79x
BenchmarkParallel_8x8x05M-32      121164.99     233673.15     1.93x
BenchmarkParallel_8x8x05M-64      93675.25      216675.22     2.31x
BenchmarkParallel_8x8x05M-128     87093.32      171221.14     1.97x

AVX512 by itself was actually slower than current AVX2 code, so it was removed.

@klauspost klauspost marked this pull request as ready for review November 16, 2022 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants