Skip to content

Many test failures when compiled with -march=haswell -ftree-vectorize #732

@bartoldeman

Description

@bartoldeman

Description

Tested on master, compiling LAPACK with

cp make.inc.example make.inc
make FC="gfortran-11.3.0" make FFLAGS="-O2 -frecursive -march=haswell -ftree-vectorize" -j 32

brings about many test failures:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1296387         1301    (0.100%)        0       (0.000%)
DOUBLE PRECISION        1304985         1274    (0.098%)        0       (0.000%)
COMPLEX                 748111          1240    (0.166%)        0       (0.000%)
COMPLEX16               761754          561     (0.074%)        0       (0.000%)

--> ALL PRECISIONS      4111237         4376    (0.106%)        0       (0.000%)

There are three categories of failures here:

  1. A bug in GCC, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107254. Fortunately a fix is provided there and it can be trivially backported.
  2. Two minor test failures in REAL and COMPLEX, see stest_rfp: STFSM test is very sensitive to small numeric errors in BLAS STRSM  #679
  3. The tests insisting that when e.g. CGEEV is called to provide eigenvectors, it will give the exact same eigenvalues (no numerical tolerance) as when it does not provide eigenvectors.

Applying the patch to GCC, tests go down to:

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1310427         1       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 762151          254     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4155541         749     (0.018%)        0       (0.000%)

Increasing the tolerance for 2) using

diff --git a/TESTING/ctest_rfp.in b/TESTING/ctest_rfp.in
index d6988f2a7..82e147535 100644
--- a/TESTING/ctest_rfp.in
+++ b/TESTING/ctest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing COMPLEX LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits
diff --git a/TESTING/stest_rfp.in b/TESTING/stest_rfp.in
index 08dbf83e6..9b082b7df 100644
--- a/TESTING/stest_rfp.in
+++ b/TESTING/stest_rfp.in
@@ -5,5 +5,5 @@ Data file for testing REAL LAPACK linear equation routines RFP format
 1 2 15                         Values of NRHS (number of right hand sides)
 9                              Number of matrix types (list types on next line if 0 < NTYPES <  9)
 1 2 3 4 5 6 7 8 9              Matrix Types
-30.0                           Threshold value of test ratio
+42.0                           Threshold value of test ratio
 T                              Put T to test the error exits

we get

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING directory
SUMMARY                 nb test run     numerical error         other error  
================        ===========     =================       ================  
REAL                    1318203         0       (0.000%)        0       (0.000%)
DOUBLE PRECISION        1319025         0       (0.000%)        0       (0.000%)
COMPLEX                 769927          253     (0.033%)        0       (0.000%)
COMPLEX16               763938          494     (0.065%)        0       (0.000%)

--> ALL PRECISIONS      4171093         747     (0.018%)        0       (0.000%)

The third one is more complex to analyze. The way I understand it is that gfortran has some freedom in how to apply FMA's to expressions such as a*b+c*d (ie. fma(a,b,c*d) or fma(c,d,a*b)) and these give slightly different results.
The loop bounds for computing and not computing eigenvectors for various loops are different, hence it is often the case that a loop with eigenvectors is vectorized, and without isn't, or for different loop iterations, and the vectorized use of FMA isn't identical to the unvectorized use of FMA. And with complex numbers of course there are even more permutations possible. Adding parentheses forces the compiler to use one variety.

Applying this patch:

diff --git a/SRC/clahqr.f b/SRC/clahqr.f
index dbd848e2f..260764ef9 100644
--- a/SRC/clahqr.f
+++ b/SRC/clahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = CONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = CONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*CONJG( V2 )
    90       CONTINUE
diff --git a/SRC/claqr5.f b/SRC/claqr5.f
index 0a01cc226..a19800b1e 100644
--- a/SRC/claqr5.f
+++ b/SRC/claqr5.f
@@ -616,7 +616,7 @@
                T3 = T1*CONJG( V( 3, M ) )
                DO 70 J = JTOP, MIN( KBOT, K+3 )
                   REFSUM = H( J, K+1 ) + V( 2, M )*H( J, K+2 )
-     $                     + V( 3, M )*H( J, K+3 )
+     $                     + ( V( 3, M )*H( J, K+3 ) )
                   H( J, K+1 ) = H( J, K+1 ) - REFSUM*T1
                   H( J, K+2 ) = H( J, K+2 ) - REFSUM*T2
                   H( J, K+3 ) = H( J, K+3 ) - REFSUM*T3
diff --git a/SRC/zlahqr.f b/SRC/zlahqr.f
index 9413f20cc..c632f578a 100644
--- a/SRC/zlahqr.f
+++ b/SRC/zlahqr.f
@@ -484,7 +484,7 @@
 *           in columns K to I2.
 *
             DO 80 J = K, I2
-               SUM = DCONJG( T1 )*H( K, J ) + T2*H( K+1, J )
+               SUM = DCONJG( T1 )*H( K, J ) + ( T2*H( K+1, J ) )
                H( K, J ) = H( K, J ) - SUM
                H( K+1, J ) = H( K+1, J ) - SUM*V2
    80       CONTINUE
@@ -493,7 +493,7 @@
 *           matrix in rows I1 to min(K+2,I).
 *
             DO 90 J = I1, MIN( K+2, I )
-               SUM = T1*H( J, K ) + T2*H( J, K+1 )
+               SUM = T1*H( J, K ) + ( T2*H( J, K+1 ) )
                H( J, K ) = H( J, K ) - SUM
                H( J, K+1 ) = H( J, K+1 ) - SUM*DCONJG( V2 )
    90       CONTINUE
diff --git a/SRC/zlaqr5.f b/SRC/zlaqr5.f
index 4fa5ee5b0..656a7ab37 100644
--- a/SRC/zlaqr5.f
+++ b/SRC/zlaqr5.f
@@ -446,7 +446,7 @@
                T2 = T1*V( 2, M22 )
                DO 40 J = K+1, JBOT
                   REFSUM = H( K+1, J ) +
-     $                     DCONJG( V( 2, M22 ) )*H( K+2, J )
+     $                     ( DCONJG( V( 2, M22 ) )*H( K+2, J ) )
                   H( K+1, J ) = H( K+1, J ) - REFSUM*T1
                   H( K+2, J ) = H( K+2, J ) - REFSUM*T2
    40          CONTINUE

makes the test failures go to 0.

But it looks quite fragile, in that it takes trial and error to see which expressions need braces.

Would it be ok instead to replace test 5, e.g.

lapack/TESTING/EIG/cdrvev.f

Lines 799 to 805 in 28f7e83

*
* Do Test (5)
*
DO 150 J = 1, N
IF( W( J ).NE.W1( J ) )
$ RESULT( 5 ) = ULPINV
150 CONTINUE

by a test that uses tolerances, like the one used below, or would that go against some specifications?

*> (8) | W(Z computed) - W(Z not computed) | / ( |W| ulp )

lapack/TESTING/EIG/cchkhs.f

Lines 825 to 835 in 28f7e83

*
* Do Test 8: | W3 - W1 | / ( max(|W1|,|W3|) ulp )
*
TEMP1 = ZERO
TEMP2 = ZERO
DO 130 J = 1, N
TEMP1 = MAX( TEMP1, ABS( W1( J ) ), ABS( W3( J ) ) )
TEMP2 = MAX( TEMP2, ABS( W1( J )-W3( J ) ) )
130 CONTINUE
*
RESULT( 8 ) = TEMP2 / MAX( UNFL, ULP*MAX( TEMP1, TEMP2 ) )

Checklist

  • I've included a minimal example to reproduce the issue
  • I'd be willing to make a PR to solve this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions