-
Notifications
You must be signed in to change notification settings - Fork 9
use UCX as PML with OpenMPI 4.x and newer, if ompi_info reports it as supported #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| _mpirun_version = staticmethod(lambda ver: version_in_range(ver, '4', None)) | ||
|
|
||
| def use_ucx_pml(self): | ||
| """Determine whether or not to use the UCX Point-to-Point Messaging Layer (PML).""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this liited to openmpi 4? any openmpi with ucx support should probably use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several reasons:
- OpenMPI 4 (used in
2020atoolchains) is a natural cutoff, since it's a new major version (2019band older use OpenMPI <=3) - "OpenMPI supports UCX starting from version 3.0, but it’s recommended to use version 4.0 or higher due to stability and performance improvements." (see https://openucx.github.io/ucx/running.html)
- The OpenMPI 3.1.4 modules we have in
foss/2019bhave UCX support, but rely on theucxRPM we have installed in the OS (while infoss/2020awe installUCXas a proper dep for OpenMPI 4.0.3, so we're in control) - We've been using
openibrather thanucxfor a while now infoss/2019b, which mostly works fine, so why change that... mympirunis currently broken withfoss/2020a, because the OpenMPI in there no longer supportsopenib(see configure OpenMPI 4.x with --without-verbs when using UCX easybuilders/easybuild-easyblocks#2188), this PR fixes that by usingucxinstead
We can still consider also using the ucx PML with OpenMPI 3.x, but then we need to reinstall the OpenMPI 3 modules with a proper UCX dep (so we're in control), and do more thorough testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought we couldn't build the openib due to missing headers, and that it was thus also not avail on openmpi3.
can you also add the 2nd item in a comment or in docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenMPI 3 was being configured with --without-verbs on RHEL8 because of a bug in EasyBuild which is now fixed (see easybuilders/easybuild-framework#3477), but we still need to reinstall OpenMPI 3 on doduo, that's on our TODO list before we give pilot users access.
…because there's no /dev/infiniband
The OpenMPI 4.0.3 we have as a part of
foss/2020ano longer supportsopenibas BTL, because it was built with support for UCX (see easybuilders/easybuild-easyblocks#2188).The changes in
vsc/mympirun/mpi/openmpi.pymake sure that--mca pml ucxis used with OpenMPI 4.x ifompi_inforeportsucxas supported PML.The other changes are trivial style cleanups.
WIP, because the tests should be enhanced to cover this change