Replies: 8 comments 1 reply
-
|
Is a 1.5% performance improvement statistically noticeable (i.e. can we trust that finding to be accurate)? I don't know how Fishbench works. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, I propose we focus on profile-build, since that's how we create the release binaries and test on fishtest. |
Beta Was this translation helpful? Give feedback.
-
|
I ran on my system the new speedtest for all compilers I have available: Which would suggest that the differences are generally small, and gcc is doing fine. The corresponding best run is: script used for testing #!/bin/bash
echo "Compiling gcc"
for comp in g++-9 g++-10 g++-11 g++-12 g++-13
do
make -j profile-build CXX=$comp COMP=gcc >& out.compile.$comp
mv stockfish stockfish.$comp
done
echo "Compiling clang"
for comp in clang++-11 clang++-12 clang++-13 clang++-14 clang++-15 clang++-16 clang++-17 clang++-18 clang++-19 clang++-20
do
make -j profile-build CXX=$comp COMP=clang >& out.compile.$comp
mv stockfish stockfish.$comp
done
echo "Verify node counts: "
for comp in g++-9 g++-10 g++-11 g++-12 g++-13 clang++-11 clang++-12 clang++-13 clang++-14 clang++-15 clang++-16 clang++-17 clang++-18 clang++-19 clang++-20
do
nodes=`grep "Nodes searched" out.compile.$comp | awk '{print $NF}'`
printf "%20s : %10s\n" $comp $nodes
done
echo "Running speedtests: "
for comp in g++-9 g++-10 g++-11 g++-12 g++-13 clang++-11 clang++-12 clang++-13 clang++-14 clang++-15 clang++-16 clang++-17 clang++-18 clang++-19 clang++-20
do
for iter in `seq 1 3`
do
./stockfish.$comp speedtest >& out.speedtest.$comp.$iter
done
done
echo "Best results speedtests (nps): "
for comp in g++-9 g++-10 g++-11 g++-12 g++-13 clang++-11 clang++-12 clang++-13 clang++-14 clang++-15 clang++-16 clang++-17 clang++-18 clang++-19 clang++-20
do
bestnps=`grep "Nodes/second" out.speedtest.$comp.* | sort -n -k3 | tail -n 1 | awk '{print $NF}'`
printf "%20s : %10s\n" $comp $bestnps
done
(with clang I need a small Makefile feature to have this working, might PR separately). |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Clang 20.1.1 vs. g++ 15.0.1 only 0.8% speedup in the run I did (arbitrary choice of # of threads). I used profile-build for both: |
Beta Was this translation helpful? Give feedback.
-
|
By the way, Clang 20, unfortunately, has a slight regression when compiled with VNNI for certain CPU architectures where it ends up using a SIMD instruction with dependencies. Normally, this is fine, but when used in a looping context (say for NNUE), it leads to it being slower than the non-dependent yet lesser throughput instructions being used. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Latest update for GCC 15.2.0 vs Clang 21.1.1 AVX512 profile. So about a 3.6% speedup for Clang. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is in reference to #5951 (comment)
On an i9-7980XE under Windows comparing Clang 20.1.1 to GCC 14.2.0 tested using Fishbench. Results for 200 tests for each version.
AVX512 non-profile:
AVX512 profile:
BMI2 non-profile:
BMI2 profile:
So a speedup for Clang of about 7% for non profile builds but only about 1.5% for profile builds. Hopefully we can collect a few more submissions from other machines. Seems GCC benefits from profile builds much more than Clang.
Beta Was this translation helpful? Give feedback.
All reactions