Workbench edit4juwels gnu compiler #676

patrickscholz · 2025-01-28T13:16:43Z

fix some issues where pointer are associated allthough vector array wasnt allocated yet --> which triggers error with extended gnu compile options let to a couple addons in src/associate_mesh_ass.h and src/associate_part_ass.h
e.g.

if (allocated(partit%remList_elem2D)) then
 ...

make juwels compiler settings work with stage/2025
put additional debug compiler options for GNU into src/CMakeList.txt
The additional compiler flags found occasionally the problem that communication arrays where not found to be contiguous in memory and therefor wanting to make a copy of them before communication, led to a couple of addons in gen_halo_exchange.F90 e.g.

! --> old
ELSE
   call MPI_SEND( arr2D, myDim_nod2D, MPI_DOUBLE_PRECISION, 0, 2, MPI_COMM_FESOM, MPIerr )
ENDIF

! --> new
ELSE
   call MPI_SEND( arr2D(1:myDim_nod2D), myDim_nod2D, MPI_DOUBLE_PRECISION, 0, 2, MPI_COMM_FESOM, MPIerr )
ENDIF

in ice_thermo_oce.F90 fix issue with the initialisation of the variable rsf when using linfs otherwise leads to problems when all arrays are initialsed with Nan's by gnu compiler flag
fix indice issue in oce_spp.F90 where brine rejection were written into bottom topography also leaded to NaN in the bottom topography and triggered an NaN checker
Mysterie Issue checked with Llview on juwels:
(mesh A0_40, 11.5M vertices, 69 levels, runs on 4800CPUs on Juwels )

Occasionally on juwels 1 compute node seem to require 4x times more memory than any of the other compute nodes. This issue is not consistently reproducible. I assume it is somehow attributed to the I/O system. Which might be the reasons for the OOM (out of memory) errors that Vasco encountered with his setup on juwels. We have to keep an eye on this, also what happens on other machines.
I think this is how the RAM should look like if everything works as it should ...
Improve Juwels environment file
fix and test juwels GNU and Intel compiler flags for hopefully optimal performance

…if variables are already allocated before an array pointer into that variable is associated, otherwise GNU extended compiler option trigger an error

… are allocated

…erwise nan debug checker is triggerd

…by imposing a limiter to kml

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

…acer gradients are computed

…ly where the GNU compiler complained about not recognizing contiguous arrays fir the mpi communication

JanStreffing · 2025-02-03T11:07:40Z

FYI @ufukozkan and I recently also made a Stage 2025 with Intel icx compiler and ParastationMPI on Juwels that works. This was in esm_tools, but it would be easy to add the resulting env.sh into fesom2.

patrickscholz · 2025-02-03T11:10:38Z

@JanStreffing: I tried this as well, but i had problems to solve some MPI dependencies which lead into an compiler error in FESOM2. Did you try to compile FESOM2.6 with this?

JanStreffing · 2025-02-03T11:11:35Z

Yes 2.6.5

JanStreffing · 2025-02-03T11:13:44Z

Here is what we came up with as environment file. Obviously some things here are not needed for FESOM and are for other parts of AWI-CM3:

#!/usr/bin/bash
# ENVIRONMENT used in test_960_v5_checking_for_oasis_compute_18500101-18500101.run
# Use this file to source the environment in your
# preprocessing or postprocessing scripts

module purge
module load Stages/2025
module load Intel/2024.2.0
module load ParaStationMPI/5.10.0-1
module load CMake/3.29.3
module load Python/3.12.3
module load imkl/2024.2.0
module load Perl/5.38.2
module load Perl-bundle-CPAN/5.38.2
module load git/2.45.1
module load libaec FFTW cURL netCDF netCDF-Fortran ecCodes CDO NCO
module list

export LC_ALL=en_US.UTF-8
export TMPDIR=/tmp
export FC=mpifort
export F77=mpifort
export MPIFC=mpifort
export FCFLAGS=-free
export CC=mpicc
export CXX=mpic++
export MPIROOT="$($FC -show | perl -lne 'm{ -I(.*?)/include } and print $1')"
export MPI_LIB="$($FC -show |sed -e 's/^[^ ]*//' -e 's/-[I][^ ]*//g')"
export AEC_ROOT=$EBROOTLIBAEC
export SZIPROOT=$EBROOTLIBAEC
export HDF5ROOT=$EBROOTHDF5
export HDF5_ROOT=$EBROOTHDF5
export NETCDFROOT=$EBROOTNETCDF
export NETCDFFROOT=$EBROOTNETCDFMINFORTRAN
export ECCODESROOT=$EBROOTECCODES
export HDF5_C_INCLUDE_DIRECTORIES=$HDF5_ROOT/include
export NETCDF_Fortran_INCLUDE_DIRECTORIES=$NETCDFFROOT/include
export NETCDF_C_INCLUDE_DIRECTORIES=$NETCDFROOT/include
export NETCDF_CXX_INCLUDE_DIRECTORIES=$NETCDFROOT/include
export OASIS3MCT_FC_LIB="-L$NETCDFFROOT/lib -lnetcdff"
export PERL5LIB=/p/project/chhb19/HPC_libraries/perl5/lib/perl5
export PERL5_PATH=$PERL5LIB
export PERL5OPT=-Mwarnings=FATAL,uninitialized
export MKL_CBWR=AUTO,STRICT
export LD_RUN_PATH=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/p/scratch/cesmtst/oezkan1/runtime/awicm3-v3.3//test_960_v5_checking_for_oasis/run_18500101-18500101/work//lib/fesom/
export OIFS_FFIXED=""
export GRIB_SAMPLES_PATH="$ECCODESROOT/share/eccodes/ifs_samples/grib1_mlgrib2/"
export DR_HOOK_IGNORE_SIGNALS=-1
export OMP_SCHEDULE=STATIC
export OMP_STACKSIZE=128M
export MAIN_LDFLAGS=-openmp
export USER=oezkan1
export FESOM_USE_CPLNG="active"
export ECE_CPL_NEMO_LIM="false"
export ECE_CPL_FESOM_FESIM="true"
export ECE_AWI_CPL_FESOM="true"
export ENVIRONMENT_SET_BY_ESMTOOLS=TRUE

unset SLURM_DISTRIBUTION
unset SLURM_NTASKS
unset SLURM_NPROCS
unset SLURM_ARBITRARY_NODELIST

patrickscholz · 2025-02-03T12:11:20Z

@JanStreffing: You are right the compiling with intel on JUWELS works but only when you prescribe the compiler env variables:
export FC=mpifort
export F77=mpifort
export MPIFC=mpifort
export FCFLAGS=-free
export CC=mpicc
export CXX=mpic++
... i always assumed that they should be automatically setted by setting the module environment.

patrickscholz · 2025-02-03T13:25:26Z

@JanStreffing @dsidoren : just made a small test run on juwels for GCC and Intel compiler (core2 mesh, 192CPus, simulate 1month, with i/O and restarts writing). It turns out that a GNU compiled FESOM2 is on JUWELS faster by factor 1.3 than an INTEL compiled FESOM2.

Total Runtime (1month, COR2):
- GCC Compiler: 533.52 sec
- INTEL Compiler: 723.8 sec

... not sure if that hold for large meshes as well!

… by factor 1.3

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

patrickscholz · 2025-02-03T15:22:33Z

@JanStreffing, @dsidoren : Also tried large AO40 mesh from Vasco (11.5M vertices, 69 lev, 4800 CPUs , 100 Compute nodes, simulated 100 steps with 1x mean i/o at the end )

GCC Compiler: 150.7 sec
Intel Compiler: 250.4 sec

GCC speed up by factor 1.66!!!

JanStreffing · 2025-02-03T15:38:16Z

Can you try once with ParaStationMPI-mt ?

patrickscholz · 2025-02-03T15:52:38Z

@JanStreffing: Ich seh gerade, wir haben glaub ich für Juwels keine optimisierung an für intel ...

if(${CMAKE_Fortran_COMPILER_ID} STREQUAL  Intel )
   target_compile_options(${PROJECT_NAME} PRIVATE -r8 -i4 -fp-model precise -no-prec-div -no-prec-sqrt -fimf-use-svml -ip -init=zero -no-wrap-margin -fpe0) # add -fpe0 for RAPS environment
   if(${FESOM_PLATFORM_STRATEGY} STREQUAL  levante.dkrz.de )
      target_compile_options(${PROJECT_NAME} PRIVATE -march=core-avx2 -mtune=core-avx2)
   elseif(${FESOM_PLATFORM_STRATEGY} STREQUAL leo-dcgp )
      target_compile_options(${PROJECT_NAME} PRIVATE -O3 -xCORE-AVX512 -qopt-zmm-usage=high -align array64byte -ipo)
   elseif(${FESOM_PLATFORM_STRATEGY} STREQUAL mn5-gpp )
      target_compile_options(${PROJECT_NAME} PRIVATE -O3 -xCORE-AVX512 -qopt-zmm-usage=high -align array64byte -ipo)
   elseif(${FESOM_PLATFORM_STRATEGY} STREQUAL  albedo)
      target_compile_options(${PROJECT_NAME} PRIVATE -march=core-avx2 -O3 -ip -fPIC -qopt-malloc-options=2 -qopt-prefetch=5 -unroll-aggressive) # -g -traceback -check) #NEC mpi option
   elseif(${FESOM_PLATFORM_STRATEGY} STREQUAL atosecmwf )
      target_compile_options(${PROJECT_NAME} PRIVATE -march=core-avx2 -mtune=core-avx2)
   else()
      target_compile_options(${PROJECT_NAME} PRIVATE -xHost)
   endif()

patrickscholz · 2025-02-03T18:36:58Z

AO40 mesh, 4800 CPUs runtime for 100 steps:
GCC/openMPI , (-O2, ... ) : 150.7 sec
Intel/ParaStationMPI , (-XHost ) : 250.4 sec
Intel/ParaStationMPI-*mt, (-XHost ) : 187.4 sec
Intel/ParaStationMPI , (-O3 -xCORE-AVX512 ...) : 249.8 sec
Intel/ParaStationMPI-*mt, (-O3 -xCORE-AVX512 ...) : 264.3 sec

... really weird behavior need to play a bit more!

… juwels and other maschines

patrickscholz · 2025-02-05T13:45:52Z

@JanStreffing, @dsidoren , @suvarchal

Core2 mesh, 192 CPUs @ juwels, simulated 1 month with 1x times meanI/O

Compiler	Options	- runtime [sec.] (core2, 1 month, 192CPUs @ juwels)
GCC/openMPI		390
GCC/openMPI	-O2	192
GCC/openMPI	-O3 -march=skylake-avx512 -mtune=skylake-avx512 -mprefer-vector-width=512 -falign-loops=64 -falign-functions=64 -falign-jumps=64 (chatGPT recomendation)	173
Intel/Para...MPI	nothing	309s
Intel/Para...MPI	-O2	114s
Intel/Para...MPI	-O3	115s
Intel/Para...MPI	-O3 -xCORE-AVX512	117s
Intel/Para...MPI	-O3 -xCORE-AVX512 -qopt-zmm-usage=high -align array64byte	119s
Intel/Para...MPI	-O2 -xCORE-AVX2	114s
Intel/Para...MPI	-O3 -xCORE-AVX2	112s
Intel/Para...MPI	-O3 -xCORE-AVX2 -qopt-streaming-stores=always	126s
Intel/Para...MPI	-O3 -xCORE-AVX2 -qopt-prefetch=5	116s
Intel/Para...MPI	-O3 -xCORE-AVX2 -funroll-loops	113s
Intel/Para...MPI-mt	-O3 -xCORE-AVX2	113s

-Summarys for Juwels performance: Intel/ParaStationMPI with -O3 -xCORE-ACX2 is fastest option

-PS: It looked that so far we had no -Ox optimization for Levante activated. I changed that with this pull request!

-PPS: asynchronous Multithreading doesnt work on juwels either

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

JanStreffing · 2025-02-05T14:15:10Z

Good work. Maybe @ufukozkan you can try this on juwels with AWI-CM3 v3.3?

scholz6 and others added 17 commits January 28, 2025 12:28

check in ../src/associate_mesh_ass.h and ../src/associate_part_ass.h …

502fb6c

…if variables are already allocated before an array pointer into that variable is associated, otherwise GNU extended compiler option trigger an error

add output when debug_flug is true

856d52a

exclude possibility that associate arrays are pointed into befor they…

8452ad8

… are allocated

exclude possibility that associate arrays are pointed into befor they…

2a67feb

… are allocated

make sure variable rsf is initialised with zero in case of linfs, oth…

f0a27ae

…erwise nan debug checker is triggerd

make sure that SPP does not write values into the bottom topography, …

90d5486

…by imposing a limiter to kml

add more debug compiler flag for the GNU compiler, relevant for juwels

47ceb53

Merge branch 'main' into workbench_edit4juwels_GNUcompiler

aba41c0

add more info for debug_flag=True, fix gnu compiler issue

b59e7e2

Merge branch 'workbench_edit4juwels_GNUcompiler' of https://github.co…

697fb90

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

fix small allocation bug in ../src/gen_modules_diag.F90 where the tr…

4c73a66

…acer gradients are computed

switch off extended GNU compiler flags

76912b2

streamline juwels env shell file. make stages easy exchangable

f310dff

add possible MPI_Barrier workaround for juwels, in moment dont use it

106336c

fix a couple of smaller issues in src/gen_halo_exchange.F90, especial…

d678955

…ly where the GNU compiler complained about not recognizing contiguous arrays fir the mpi communication

Merge branch 'main' into workbench_edit4juwels_GNUcompiler

bd562fd

fix omp intent

ba53bbc

patrickscholz requested review from JanStreffing and dsidoren February 3, 2025 11:05

patrickscholz marked this pull request as ready for review February 3, 2025 11:11

add intel/mpi compiler to juwels env file

8a2f38b

patrickscholz added 2 commits February 3, 2025 14:36

make GCC compiler default for the moment, seems to be the faster one…

776e81c

… by factor 1.3

Merge branch 'workbench_edit4juwels_GNUcompiler' of https://github.co…

6d4809a

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

patrickscholz requested a review from suvarchal February 3, 2025 14:02

clean up juwels env file

c619bdd

patrickscholz added 2 commits February 5, 2025 09:55

fix optimisation compilÃer flags for GCC/openMPI and Intel/paraMPI on…

1ed602f

… juwels and other maschines

found hopefully optimal compiler parameter settings for Juwels

b54dece

patrickscholz added 4 commits February 5, 2025 14:48

fix testcase GNU compiler complains about -native option

231a36e

Merge branch 'main' into workbench_edit4juwels_GNUcompiler

19999cc

change comment

3103003

Merge branch 'workbench_edit4juwels_GNUcompiler' of https://github.co…

50b725b

…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler

JanStreffing requested a review from ufukozkan February 5, 2025 14:14

JanStreffing approved these changes Feb 5, 2025

View reviewed changes

patrickscholz merged commit ec14775 into main Feb 5, 2025
4 checks passed

Workbench edit4juwels gnu compiler #676

Workbench edit4juwels gnu compiler #676

Uh oh!

Conversation

patrickscholz commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanStreffing commented Feb 3, 2025

Uh oh!

patrickscholz commented Feb 3, 2025

Uh oh!

JanStreffing commented Feb 3, 2025

Uh oh!

JanStreffing commented Feb 3, 2025

Uh oh!

patrickscholz commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickscholz commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickscholz commented Feb 3, 2025

Uh oh!

JanStreffing commented Feb 3, 2025

Uh oh!

patrickscholz commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickscholz commented Feb 3, 2025

Uh oh!

patrickscholz commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanStreffing commented Feb 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

patrickscholz commented Jan 28, 2025 •

edited

Loading

patrickscholz commented Feb 3, 2025 •

edited

Loading

patrickscholz commented Feb 3, 2025 •

edited

Loading

patrickscholz commented Feb 3, 2025 •

edited

Loading

patrickscholz commented Feb 5, 2025 •

edited

Loading