Skip to content

Conversation

@sebastianbeyer
Copy link
Collaborator

@sebastianbeyer sebastianbeyer commented May 19, 2025

Adds a new option BUILD_MESHPARTITIONER to the main fesom build that will build the mesh partitioner as well.

This makes it easier to build it together with the model in the ifs-bundle, so that we can add some automation to generate missing partitioning files.

I would like to keep this as the only way to build the partitioner to have only one central CMakeLists.txt but I left the current option because I don't know if anything (ESMtools...? @mandresm ) relies on this being build this way.

Adds a new option BUILD_MESHPARTITIONER to the main fesom build that
will build the mesh partitioner as well.
@sebastianbeyer sebastianbeyer added the enhancement New feature or request label May 19, 2025
@mandresm
Copy link
Collaborator

ESM-Tools can be changed, and the changes would only be in the configuration files, not in the backend, so ESM-Tools should not be blocking this at all, I think is a good idea to have it like this :)

In any case, I don't think anyone is using the mesh partitioning through ESM-Tools.

@pgierz
Copy link
Member

pgierz commented May 19, 2025

In any case, I don't think anyone is using the mesh partitioning through ESM-Tools.

I can second that. We have a fesom_mesh_part configuration, but no one is ever using it. At least from the HPC side, the recommended mesh partitioning workflow is now happening through the containerized mesh tools...

@trackow
Copy link
Contributor

trackow commented May 19, 2025

Haven't tried this in detail yet but this is a very good idea! Thanks for the initiative @sebastianbeyer

@JanStreffing JanStreffing self-requested a review May 19, 2025 14:55
@JanStreffing
Copy link
Collaborator

JanStreffing commented May 19, 2025

I tried using an esm_tools generated script (/work/ab0246/a270092/model_codes/comp-fesom-2.6_script.sh), modified to:

pushd fesom-2.6
mkdir -p build; cd build; cmake -DBUILD_MESHPARTITIONER=True -DCMAKE_INSTALL_PREFIX=../ ..;   make install -j `nproc --all`
popd

and get:

[ 72%] Building Fortran object src/CMakeFiles/meshpartitioner.dir/oce_modules.F90.o
[ 72%] Built target m2gmetis
[ 72%] Built target cmpfillin
/work/ab0246/a270092/model_codes/fesom-2.6/src/gen_halo_exchange.F90(2566): error #7013: This module file was not generated by any release of this compiler.   [O_PARAM]
use MOD_MESH
----^
/work/ab0246/a270092/model_codes/fesom-2.6/src/gen_halo_exchange.F90(2570): error #6457: This derived type name has not been declared.   [T_PARTIT]
type(t_partit), intent(inout), target :: partit
-----^
/work/ab0246/a270092/model_codes/fesom-2.6/src/gen_halo_exchange.F90(2575): error #6158: The structure-name is invalid or is missing.
integer              :: req(partit%npes-1)
----------------------------^
/work/ab0246/a270092/model_codes/fesom-2.6/src/associate_part_def.h(3): error #6457: This derived type name has not been declared.   [COM_STRUCT]
  type(com_struct), pointer     :: com_nod2D
-------^
/work/ab0246/a270092/model_codes/fesom-2.6/src/associate_part_def.h(4): error #6457: This derived type name has not been declared.   [COM_STRUCT]
  type(com_struct), pointer     :: com_elem2D
-------^
/work/ab0246/a270092/model_codes/fesom-2.6/src/associate_part_def.h(5): error #6457: This derived type name has not been declared.   [COM_STRUCT]
  type(com_struct), pointer     :: com_elem2D_full
-------^
/work/ab0246/a270092/model_codes/fesom-2.6/src/associate_part_ass.h(1): internal error: Please visit 'http://www.intel.com/software/products/support' for assistance.
MPI_COMM_FESOM  => partit%MPI_COMM_FESOM
^
[ Aborting due to internal error. ]
compilation aborted for /work/ab0246/a270092/model_codes/fesom-2.6/src/gen_halo_exchange.F90 (code 1)
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:491: src/CMakeFiles/fesom.dir/gen_halo_exchange.F90.o] Error 1
make[2]: *** Waiting for unfinished jobs....

Using the configure.sh script that comes with the source code, the option works for me.

@sebastianbeyer
Copy link
Collaborator Author

error #7013: This module file was not generated by any release of this compiler.

are you sure you are doing a clean build?

@JanStreffing
Copy link
Collaborator

error #7013: This module file was not generated by any release of this compiler.

are you sure you are doing a clean build?

You are correct, I thought I did, but now that I tried it again it works.

@JanStreffing
Copy link
Collaborator

So far, the linked binary is installed to fesom-2.6/bin/fesom_ini.x. The existing job_ini scripts copy it from there. If you would like to keep the install location at fesom-2.6/build/bin/meshpartitioner, I would ask that the job_ini scripts in the work folder should be modified accordingly. Alternatively, we can once again link from fesom-2.6/build/bin/meshpartitioner to fesom-2.6/bin/fesom_ini.x and keep the job_ini scripts.

@sebastianbeyer
Copy link
Collaborator Author

I don't have a strong opinion on the name of the binary... meshpartitioner does explain a little bit more what it is doing than fesom_ini.x, but it's also not a great name xD

@sebastianbeyer
Copy link
Collaborator Author

I saw in the job_ini scripts that in some the mesh partitioner is called with mpi, but with only a single task:
srun --mpi=pmi2 --ntasks=1 ./fesom_ini.x
Does the partitioner support mpi? I did not see anything mpi related in the code (but I did not check all the places). If so, then why run it with a single task only?

@JanStreffing
Copy link
Collaborator

I saw in the job_ini scripts that in some the mesh partitioner is called with mpi, but with only a single task: srun --mpi=pmi2 --ntasks=1 ./fesom_ini.x Does the partitioner support mpi? I did not see anything mpi related in the code (but I did not check all the places). If so, then why run it with a single task only?

I don't know. @dsidoren, @patrickscholz?

@sebastianbeyer
Copy link
Collaborator Author

also, should building the partitioner be on or off by default?

@JanStreffing
Copy link
Collaborator

Is it CI tested? If so, it can be on.

@koldunovn
Copy link
Member

koldunovn commented May 20, 2025

I saw in the job_ini scripts that in some the mesh partitioner is called with mpi, but with only a single task: srun --mpi=pmi2 --ntasks=1 ./fesom_ini.x Does the partitioner support mpi? I did not see anything mpi related in the code (but I did not check all the places). If so, then why run it with a single task only?

It did long time ago, but now we only run it as a single core job, I think :)

@JanStreffing JanStreffing marked this pull request as ready for review May 20, 2025 10:09
@JanStreffing
Copy link
Collaborator

Is it CI tested? If so, it can be on.

It is: https://github.com/FESOM/fesom2/actions/runs/15120009102
IMO it can be on be default then.

@sebastianbeyer
Copy link
Collaborator Author

It did long time ago, but now we only run it as a single core job, I think :)

But it does work with more? would be nice to have it go faster for large meshes, I guess?

@koldunovn
Copy link
Member

I had no problems even with the largest ones, and the most time consuming is writing out the output, which I think is still serial. I would put it as something nice to have, but not critical.

@sebastianbeyer
Copy link
Collaborator Author

okay, this is very weird! since you set the default to building the mesh partitioner as well, @JanStreffing , the build with openmp fails with internal compiler error: Segmentation fault This does not happen on my mac (different gcc version), but if I run with the docker build I can reproduce it, but if I just run the compile command (./configure.sh) a second time, it completes :/

Sebastian Beyer and others added 2 commits May 26, 2025 21:33
In this block set CMAKE_Fortran_MODULE_DIRECTORY to something different
from the 'normal' fesom module dir to not have a problem in a parallel
build where some module files are being written and read at the same
time (for modules that are shared between fesom main and mesh
partitioner). Would probably better to just link fesom as library to
mesh partitioner as well, but that currently implies other issues.
@JanStreffing
Copy link
Collaborator

Looks good. Shall I press the button?

@sebastianbeyer
Copy link
Collaborator Author

Thanks for fixing the typo! @suvarchal also tried something here: #707 I currently don't see how that is better than this, but I'd like to understand the difference first, before deciding :)

@sebastianbeyer sebastianbeyer merged commit ab846d0 into main Jun 3, 2025
7 checks passed
@sebastianbeyer sebastianbeyer deleted the feature/build_partitioner branch June 3, 2025 09:57
@JanStreffing JanStreffing mentioned this pull request Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants