Skip to content

Conversation

@boegel
Copy link
Member

@boegel boegel commented Oct 22, 2020

The OpenMPI 4.0.3 we have as a part of foss/2020a no longer supports openib as BTL, because it was built with support for UCX (see easybuilders/easybuild-easyblocks#2188).

The changes in vsc/mympirun/mpi/openmpi.py make sure that --mca pml ucx is used with OpenMPI 4.x if ompi_info reports ucx as supported PML.

The other changes are trivial style cleanups.

WIP, because the tests should be enhanced to cover this change

_mpirun_version = staticmethod(lambda ver: version_in_range(ver, '4', None))

def use_ucx_pml(self):
"""Determine whether or not to use the UCX Point-to-Point Messaging Layer (PML)."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this liited to openmpi 4? any openmpi with ucx support should probably use

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several reasons:

  • OpenMPI 4 (used in 2020a toolchains) is a natural cutoff, since it's a new major version (2019b and older use OpenMPI <=3)
  • "OpenMPI supports UCX starting from version 3.0, but it’s recommended to use version 4.0 or higher due to stability and performance improvements." (see https://openucx.github.io/ucx/running.html)
  • The OpenMPI 3.1.4 modules we have in foss/2019b have UCX support, but rely on the ucx RPM we have installed in the OS (while in foss/2020a we install UCX as a proper dep for OpenMPI 4.0.3, so we're in control)
  • We've been using openib rather than ucx for a while now in foss/2019b, which mostly works fine, so why change that...
  • mympirun is currently broken with foss/2020a, because the OpenMPI in there no longer supports openib (see configure OpenMPI 4.x with --without-verbs when using UCX easybuilders/easybuild-easyblocks#2188), this PR fixes that by using ucx instead

We can still consider also using the ucx PML with OpenMPI 3.x, but then we need to reinstall the OpenMPI 3 modules with a proper UCX dep (so we're in control), and do more thorough testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we couldn't build the openib due to missing headers, and that it was thus also not avail on openmpi3.
can you also add the 2nd item in a comment or in docstring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenMPI 3 was being configured with --without-verbs on RHEL8 because of a bug in EasyBuild which is now fixed (see easybuilders/easybuild-framework#3477), but we still need to reinstall OpenMPI 3 on doduo, that's on our TODO list before we give pilot users access.

@boegel boegel changed the title use UCX as PML with OpenMPI 4.x and newer, if ompi_info reports it as supported (WIP) use UCX as PML with OpenMPI 4.x and newer, if ompi_info reports it as supported Oct 23, 2020
@stdweird stdweird merged commit cfb3dc4 into hpcugent:master Oct 23, 2020
@boegel boegel deleted the ucx_openmpi4 branch October 23, 2020 12:33
@boegel boegel mentioned this pull request Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants