-
Notifications
You must be signed in to change notification settings - Fork 772
{lib}[GCCcore/12.2.0] UCC-CUDA v1.1.0 + add patch for UCC 1.1.0 for multiple component paths #17255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{lib}[GCCcore/12.2.0] UCC-CUDA v1.1.0 + add patch for UCC 1.1.0 for multiple component paths #17255
Conversation
…patches: UCC-CUDA-1.1.0_cuda_12_mem_ops.patch
|
I backported the patch that deals with CUDA 12 MEM OPS things. It builds, but, unfortunately just crashes when used. |
|
I tried to isolate the segfault i saw, but it turns out the segfault is present as soon as you just set UCC_COMPONENT_PATH to anything, so, actually, the backported patch might just work.. if we can sort out what's causing ucc_info to segfault $ ml UCC/1.1.0-GCCcore-12.2.0
$ export UCC_COMPONENT_PATH=$EBROOTUCC/lib/ucc/
$ ucc_info -v
[vera-c2:4015973:0:4015973] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid:4015973) ====
0 0x0000000000012ce0 __funlockfile() :0
1 0x00000000000ccc75 __strlen_avx2() :0
2 0x0000000000019026 ucc_str_concat() /dev/shm/UCC/1.1.0/GCCcore-12.2.0/ucc-1.1.0/src/utils/ucc_string.c:123
3 0x0000000000009ff0 ucc_check_config_file() /dev/shm/UCC/1.1.0/GCCcore-12.2.0/ucc-1.1.0/src/core/ucc_constructor.c:105
4 0x0000000000009ff0 ucc_constructor() /dev/shm/UCC/1.1.0/GCCcore-12.2.0/ucc-1.1.0/src/core/ucc_constructor.c:136
5 0x00000000000098b8 ucc_lib_config_read() /dev/shm/UCC/1.1.0/GCCcore-12.2.0/ucc-1.1.0/src/core/ucc_lib.c:368
6 0x00000000004011a5 main() /dev/shm/UCC/1.1.0/GCCcore-12.2.0/ucc-1.1.0/tools/info/ucc_info.c:131
7 0x000000000003acf3 __libc_start_main() ???:0
8 0x00000000004013ae _start() ???:0
=================================
Segmentation fault (core dumped)Ugh. Trying to see what changes they made in ucc_constructor to tell where it might go wrong, turns out they have also since then completely removed the option to specify UCC_COMPONENT_PATH, and it's now hardcoded. |
|
I found the new code without the broken UCC_COMPONENT_PATH much easier to understand. The logic isn't as convoluted and combined with install path and the rest, instead, we'd just need to hijack: status = ucc_sys_path_join(lib_path, UCC_MODULE_SUBDIR,
&ucc_global_config.component_path); though, even then, the way we do it, symlinking all default built components into the new path in UCC-CUDA is also a bit ugly (how would you handle a third component thing?) |
|
so what we really would like is to have So component path should be able to support and type of string that glob would accept; |
|
Wasn't worth fixing hte old behavior, because it
This new patch will hopefully work going forward (with minimal updates), and as a nice bonus, if someone did forget to rebuild without this patch, since i no longer set UCC_COMPONENT_PATH to anything, at least it won't completely break UCC. |
This comment was marked as resolved.
This comment was marked as resolved.
|
Test report by @Micket |
|
Test report by @Micket |
|
@boegelbot please test @ generoso |
|
@boegel: Request for testing this PR well received on login1 PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1500928368 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegel |
|
Test report by @boegelbot |
|
@boegelbot please test @ jsc-zen2 |
|
@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1500937078 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot |
boegel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
Going in, thanks @Micket! |
(created using
eb --new-pr)