-
Notifications
You must be signed in to change notification settings - Fork 67
Workbench edit4juwels gnu compiler #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…if variables are already allocated before an array pointer into that variable is associated, otherwise GNU extended compiler option trigger an error
…erwise nan debug checker is triggerd
…by imposing a limiter to kml
…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler
…acer gradients are computed
…ly where the GNU compiler complained about not recognizing contiguous arrays fir the mpi communication
|
FYI @ufukozkan and I recently also made a Stage 2025 with Intel icx compiler and ParastationMPI on Juwels that works. This was in esm_tools, but it would be easy to add the resulting env.sh into fesom2. |
|
@JanStreffing: I tried this as well, but i had problems to solve some MPI dependencies which lead into an compiler error in FESOM2. Did you try to compile FESOM2.6 with this? |
|
Yes 2.6.5 |
|
Here is what we came up with as environment file. Obviously some things here are not needed for FESOM and are for other parts of AWI-CM3: |
|
@JanStreffing: You are right the compiling with intel on JUWELS works but only when you prescribe the compiler env variables: |
|
@JanStreffing @dsidoren : just made a small test run on juwels for GCC and Intel compiler (core2 mesh, 192CPus, simulate 1month, with i/O and restarts writing). It turns out that a GNU compiled FESOM2 is on JUWELS faster by factor 1.3 than an INTEL compiled FESOM2. Total Runtime (1month, COR2): ... not sure if that hold for large meshes as well! |
…m/FESOM/fesom2 into workbench_edit4juwels_GNUcompiler
|
@JanStreffing, @dsidoren : Also tried large AO40 mesh from Vasco (11.5M vertices, 69 lev, 4800 CPUs , 100 Compute nodes, simulated 100 steps with 1x mean i/o at the end ) GCC Compiler: 150.7 sec GCC speed up by factor 1.66!!! |
|
Can you try once with ParaStationMPI-mt ? |
|
@JanStreffing: Ich seh gerade, wir haben glaub ich für Juwels keine optimisierung an für intel ... |
|
AO40 mesh, 4800 CPUs runtime for 100 steps: ... really weird behavior need to play a bit more! |
|
@JanStreffing, @dsidoren , @suvarchal
-Summarys for Juwels performance: Intel/ParaStationMPI with -O3 -xCORE-ACX2 is fastest option -PS: It looked that so far we had no -Ox optimization for Levante activated. I changed that with this pull request! -PPS: asynchronous Multithreading doesnt work on juwels either |
|
Good work. Maybe @ufukozkan you can try this on juwels with AWI-CM3 v3.3? |
e.g.
in ice_thermo_oce.F90 fix issue with the initialisation of the variable rsf when using linfs otherwise leads to problems when all arrays are initialsed with Nan's by gnu compiler flag
fix indice issue in oce_spp.F90 where brine rejection were written into bottom topography also leaded to NaN in the bottom topography and triggered an NaN checker
Mysterie Issue checked with Llview on juwels:


(mesh A0_40, 11.5M vertices, 69 levels, runs on 4800CPUs on Juwels )
Occasionally on juwels 1 compute node seem to require 4x times more memory than any of the other compute nodes. This issue is not consistently reproducible. I assume it is somehow attributed to the I/O system. Which might be the reasons for the OOM (out of memory) errors that Vasco encountered with his setup on juwels. We have to keep an eye on this, also what happens on other machines.
I think this is how the RAM should look like if everything works as it should ...
Improve Juwels environment file
fix and test juwels GNU and Intel compiler flags for hopefully optimal performance