Skip to content

Conversation

@Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Aug 8, 2023

(created using eb --new-pr)

Same as #18490 but for the CUDA version (single EC changed due to time it takes for testing the CUDA version)

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusml20 - Linux RHEL 7.6, POWER, 8335-GTX (power9le), 6 x NVIDIA Tesla V100-SXM2-32GB, 440.64.00, Python 2.7.5
See https://gist.github.com/Flamefire/62b26f0d25529ca1b359ca275b03b05e for a full test report.

@Flamefire
Copy link
Contributor Author

Flamefire commented Aug 9, 2023

Test report by @Flamefire
FAILED SUCCESS
Build succeeded for 0 out of 1 (1 easyconfigs in total)
taurusi8018 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/785fea163a066ee7621194c5cba657f9 for a full test report.

I traced the failure in test_cpp_extensions_jit to a driver issue: We have CUDA 11.4 as the driver but this uses 11.5.
Installing CUDAcompat/11.6 resolves the issue. I hence consider the run as a success.

@boegel boegel changed the title Fix PyTorch-1.12.1-foss-2021b (CUDA) on POWER add patches to fix PyTorch-1.12.1 w/ foss/2021b + CUDA v11.5.2 on POWER Aug 15, 2023
@Flamefire
Copy link
Contributor Author

@boegel Remaining issue found and resolved. So this is good to go from my side.

@Flamefire
Copy link
Contributor Author

Test report by @Flamefire
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
taurusi8012 - Linux CentOS Linux 7.9.2009, x86_64, AMD EPYC 7352 24-Core Processor (zen2), 8 x NVIDIA NVIDIA A100-SXM4-40GB, 470.57.02, Python 2.7.5
See https://gist.github.com/Flamefire/b1ba6a20f88f1b2251f1bc548dcde428 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#2983
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0208u09a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), 1 x NVIDIA NVIDIA A100-SXM4-40GB, 520.61.05, Python 3.6.8
See https://gist.github.com/branfosj/71dadbe33659eb03cdea532eed45dbe0 for a full test report.

@branfosj branfosj added this to the next release (4.8.1?) milestone Aug 24, 2023
@branfosj
Copy link
Member

Going in, thanks @Flamefire!

@branfosj branfosj merged commit afb7521 into easybuilders:develop Aug 24, 2023
@Flamefire Flamefire deleted the 20230808143615_new_pr_PyTorch1121 branch August 24, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants