Skip to content

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Feb 27, 2023

This PR does two things:

  1. From v2022.05.001 onwards, the config complains if CPP is not set, resulting in non-zero exit of configure. This PR sets the CPP environment variable to cpp for GCC and intel based toolchains. For other toolchains, it will raise an error and ask the user to expand the functionality of the EasyBlock, to point to the correct C preprocessor.
  2. Implement nvidia GPU support. This involves adding a combination of configure flags: --enable-nvidia-gpu, --with-cuda-path, --with-cuda-sdk-path, --with-NVIDIA-GPU-compute-capability and --enable-nvidia-sm80-gpu.

Some comments regarding the last flag: newer versions of ELPA have a dedicated kernel implemented for sm80. It's a bit strange that --with-NVIDIA-GPU-compute-capability='sm_80' does not seem to properly enable those, but they don't. The config just prints an info-message saying:

configure: You specified --with-NVIDIA-GPU-compute-capability=sm_80, but you did not --enable-nvidia-sm80-gpu. I will thus use the standard Nvidia GPU kernels (which is of course ok, just a info...)

My understanding is it will build the default kernel with the correct optimization (i.e. using -arch sm_80), but not use the dedicated code for the sm_80 kernel. It will only do the latter if we add --enable-nvidia-sm80-gpu.

An EasyConfig using the new cuda support can be found in

Caspar van Leeuwen added 8 commits February 24, 2023 13:30
…Config, automatically enable nvidia GPU support
…ake this neater and read it from the compiler definition from easybuild-framework. Still discussing that on EB slack
…preprocessor environment variable) in a way that can easily be extended or modified per toolchain family
…compute capability has been specified. Also, now check if the cuda_cc is larger or equal to 8.0, since you probably also want to build the 8.0 optimized kernel if you have 8.5 capability in your system
@casparvl
Copy link
Contributor Author

Test report by @casparvl

Overview of tested easyconfigs (in order)

  • SUCCESS ELPA-2021.11.001-foss-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/174fdb5e47ab55c906ac5b5a787c164e for a full test report.

@casparvl
Copy link
Contributor Author

Test report by @casparvl

Overview of tested easyconfigs (in order)

  • SUCCESS ELPA-2021.05.001-intel-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/981c692125367c980c865939aa10e994 for a full test report.

@casparvl
Copy link
Contributor Author

Test report by @casparvl

Overview of tested easyconfigs (in order)

  • SUCCESS ELPA-2021.11.001-intel-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/3ce8fad9a325f9b0e0f758ede76020eb for a full test report.

Comment on lines +200 to +203
# ELPA's --with-NVIDIA-GPU-compute-capability only accepts a single architecture
if len(cuda_cc) != 1:
raise EasyBuildError('ELPA currently only supports specifying one CUDA architecture when '
'building. You specified cuda-compute-capabilities: %s', cuda_cc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a hard error, or should we pick one (e.g. lowest) and throw a warning? I think that's how this is handled in a few other easyblocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should assume we know what is best for the user here, so I'd personally prefer an error. That allows the user to pick by simply passing a single compute capability on the command line. If I know that only a single one is supported, I could decide to pick the highest and only run on nodes that support that architecture.

Copy link
Contributor Author

@casparvl casparvl Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or, obviously, I could decide to pick the lowest and be able to run on any GPU node in that system. Just meant to say that both are valid choices :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's something to say about both approaches,
but imho that discussion should not block this PR.

@casparvl
Copy link
Contributor Author

Test report by @casparvl

Overview of tested easyconfigs (in order)

  • SUCCESS ELPA-2021.05.001-foss-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/de86850b895e30297b812c5715d7f355 for a full test report.

Copy link
Contributor

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@smoors smoors added the bug fix label Feb 28, 2023
@smoors smoors added this to the next release (4.7.1?) milestone Feb 28, 2023
@smoors
Copy link
Contributor

smoors commented Feb 28, 2023

Going in, thanks @casparvl!

@smoors smoors merged commit b415481 into easybuilders:develop Feb 28, 2023
@boegel boegel added the update label Mar 1, 2023
@boegel boegel changed the title Implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants