implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions #2898

casparvl · 2023-02-27T14:48:17Z

This PR does two things:

From v2022.05.001 onwards, the config complains if CPP is not set, resulting in non-zero exit of configure. This PR sets the CPP environment variable to cpp for GCC and intel based toolchains. For other toolchains, it will raise an error and ask the user to expand the functionality of the EasyBlock, to point to the correct C preprocessor.
Implement nvidia GPU support. This involves adding a combination of configure flags: --enable-nvidia-gpu, --with-cuda-path, --with-cuda-sdk-path, --with-NVIDIA-GPU-compute-capability and --enable-nvidia-sm80-gpu.

Some comments regarding the last flag: newer versions of ELPA have a dedicated kernel implemented for sm80. It's a bit strange that --with-NVIDIA-GPU-compute-capability='sm_80' does not seem to properly enable those, but they don't. The config just prints an info-message saying:

configure: You specified --with-NVIDIA-GPU-compute-capability=sm_80, but you did not --enable-nvidia-sm80-gpu. I will thus use the standard Nvidia GPU kernels (which is of course ok, just a info...)

My understanding is it will build the default kernel with the correct optimization (i.e. using -arch sm_80), but not use the dedicated code for the sm_80 kernel. It will only do the latter if we add --enable-nvidia-sm80-gpu.

An EasyConfig using the new cuda support can be found in

{math}[foss/2022a] ELPA v2022.05.001 easybuild-easyconfigs#17436

…Config, automatically enable nvidia GPU support

…IA-GPU-compute-capability=VALUE

…ake this neater and read it from the compiler definition from easybuild-framework. Still discussing that on EB slack

…preprocessor environment variable) in a way that can easily be extended or modified per toolchain family

…compute capability has been specified. Also, now check if the cuda_cc is larger or equal to 8.0, since you probably also want to build the 8.0 optimized kernel if you have 8.5 capability in your system

…mpute-capability

easybuild/easyblocks/e/elpa.py

…ssage that is raised

casparvl · 2023-02-28T11:47:06Z

Test report by @casparvl

Overview of tested easyconfigs (in order)

SUCCESS ELPA-2021.11.001-foss-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/174fdb5e47ab55c906ac5b5a787c164e for a full test report.

casparvl · 2023-02-28T11:56:17Z

Test report by @casparvl

Overview of tested easyconfigs (in order)

SUCCESS ELPA-2021.05.001-intel-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/981c692125367c980c865939aa10e994 for a full test report.

casparvl · 2023-02-28T12:00:04Z

Test report by @casparvl

Overview of tested easyconfigs (in order)

SUCCESS ELPA-2021.11.001-intel-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/3ce8fad9a325f9b0e0f758ede76020eb for a full test report.

easybuild/easyblocks/e/elpa.py

jfgrimm · 2023-02-28T12:13:22Z

easybuild/easyblocks/e/elpa.py

+            # ELPA's --with-NVIDIA-GPU-compute-capability only accepts a single architecture
+            if len(cuda_cc) != 1:
+                raise EasyBuildError('ELPA currently only supports specifying one CUDA architecture when '
+                                     'building. You specified cuda-compute-capabilities: %s', cuda_cc)


should this be a hard error, or should we pick one (e.g. lowest) and throw a warning? I think that's how this is handled in a few other easyblocks

I don't think we should assume we know what is best for the user here, so I'd personally prefer an error. That allows the user to pick by simply passing a single compute capability on the command line. If I know that only a single one is supported, I could decide to pick the highest and only run on nodes that support that architecture.

(or, obviously, I could decide to pick the lowest and be able to run on any GPU node in that system. Just meant to say that both are valid choices :))

there's something to say about both approaches,
but imho that discussion should not block this PR.

casparvl · 2023-02-28T12:22:34Z

Test report by @casparvl

Overview of tested easyconfigs (in order)

SUCCESS ELPA-2021.05.001-foss-2021b.eb

Build succeeded for 1 out of 1 (1 easyconfigs in total)
gcn1.local.snellius.surf.nl - Linux Rocky Linux 8.7, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz, 4 x NVIDIA NVIDIA A100-SXM4-40GB, 515.86.01, Python 3.6.8
See https://gist.github.com/de86850b895e30297b812c5715d7f355 for a full test report.

smoors

lgtm

smoors · 2023-02-28T19:37:17Z

Going in, thanks @casparvl!

Caspar van Leeuwen added 8 commits February 24, 2023 13:30

Added GPU support to ELPA EasyBlock. If CUDA is specified in the Easy…

91cee10

…Config, automatically enable nvidia GPU support

Currently, ELPA only supports passing one architecture to --with-NVID…

ca77752

…IA-GPU-compute-capability=VALUE

Make sure to set CPP, as newer ELPA configures require it

4865037

Set CPP env var to cpp. This is valid for GCC, but we might make to m…

6642b40

…ake this neater and read it from the compiler definition from easybuild-framework. Still discussing that on EB slack

Fixed typo, added option to enable dedicated sm80 kernel, set CPP (C …

c687f90

…preprocessor environment variable) in a way that can easily be extended or modified per toolchain family

Clarify comment

515e57d

Pull logic inside the if statement that checks if only a single cuda …

1f86084

…compute capability has been specified. Also, now check if the cuda_cc is larger or equal to 8.0, since you probably also want to build the 8.0 optimized kernel if you have 8.5 capability in your system

Make sure sm_ gets prepended in the argument for --with-NVIDIA-GPU-co…

c3e7701

…mpute-capability

casparvl mentioned this pull request Feb 27, 2023

{math}[foss/2022a] ELPA v2022.05.001 easybuilders/easybuild-easyconfigs#17436

Merged

1 task

smoors reviewed Feb 27, 2023

View reviewed changes

easybuild/easyblocks/e/elpa.py Outdated Show resolved Hide resolved

Reordered if-else statement to reduce indentation. Clarified error me…

8b154ab

…ssage that is raised

jfgrimm reviewed Feb 28, 2023

View reviewed changes

easybuild/easyblocks/e/elpa.py Outdated Show resolved Hide resolved

jfgrimm reviewed Feb 28, 2023

View reviewed changes

easybuild/easyblocks/e/elpa.py Show resolved Hide resolved

jfgrimm reviewed Feb 28, 2023

View reviewed changes

Moved up from .. import to maintain alphabetical order

40bc117

smoors approved these changes Feb 28, 2023

View reviewed changes

smoors added the bug fix label Feb 28, 2023

smoors added this to the next release (4.7.1?) milestone Feb 28, 2023

smoors merged commit b415481 into easybuilders:develop Feb 28, 2023

boegel added the update label Mar 1, 2023

boegel changed the title ~~Implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions~~ implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions #2898

implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions #2898

Uh oh!

casparvl commented Feb 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

casparvl commented Feb 28, 2023

Uh oh!

casparvl commented Feb 28, 2023

Uh oh!

casparvl commented Feb 28, 2023

Uh oh!

Uh oh!

Uh oh!

jfgrimm Feb 28, 2023

Uh oh!

casparvl Feb 28, 2023

Uh oh!

casparvl Feb 28, 2023 •

edited

Loading

Uh oh!

smoors Feb 28, 2023

Uh oh!

casparvl commented Feb 28, 2023

Uh oh!

smoors left a comment

Uh oh!

smoors commented Feb 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions #2898

implement CUDA support in the ELPA EasyBlock & fix CPP configure issue on newer ELPA versions #2898

Uh oh!

Conversation

casparvl commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

casparvl commented Feb 28, 2023

Overview of tested easyconfigs (in order)

Uh oh!

casparvl commented Feb 28, 2023

Overview of tested easyconfigs (in order)

Uh oh!

casparvl commented Feb 28, 2023

Overview of tested easyconfigs (in order)

Uh oh!

Uh oh!

Uh oh!

jfgrimm Feb 28, 2023

Choose a reason for hiding this comment

Uh oh!

casparvl Feb 28, 2023

Choose a reason for hiding this comment

Uh oh!

casparvl Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smoors Feb 28, 2023

Choose a reason for hiding this comment

Uh oh!

casparvl commented Feb 28, 2023

Overview of tested easyconfigs (in order)

Uh oh!

smoors left a comment

Choose a reason for hiding this comment

Uh oh!

smoors commented Feb 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

casparvl commented Feb 27, 2023 •

edited

Loading

casparvl Feb 28, 2023 •

edited

Loading