-
Notifications
You must be signed in to change notification settings - Fork 772
{ai,lib}[GCCcore/12.2.0,foss/2022b] PyTorch v2.1.2, NCCL v2.18.3 w/ CUDA 12.0.0 #20520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{ai,lib}[GCCcore/12.2.0,foss/2022b] PyTorch v2.1.2, NCCL v2.18.3 w/ CUDA 12.0.0 #20520
Conversation
|
Test report by @SebastianAchilles |
|
Test report by @SebastianAchilles |
|
Test report by @SebastianAchilles |
That first one failed with
I see that every now and then in various different tests especially I'll do a larger repeated run for both PRs over the weekend so I'll have the results to compare on Tuesday (Monday is a public holiday here) |
Updated software
|
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
|
Test report by @Flamefire |
f27d797 to
a9a5a6b
Compare
|
Test report by @akesandgren |
| github_account = 'NVIDIA' | ||
| source_urls = [GITHUB_SOURCE] | ||
| sources = ['v%(version)s-1.tar.gz'] | ||
| patches = ['NCCL-2.16.2_fix-cpuid.patch'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this one also need NCCL-2.18.3_fix-cudaMemcpyAsync.patch like NCCL-2.18.3-GCCcore-12.3.0-CUDA-12.1.1.eb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense I guess, added
|
Test report by @akesandgren |
akesandgren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Going in, thanks @Flamefire! |
(created using
eb --new-pr)This is meant as an alternative to #20155 using a newer NCCL version as the older one currently included in foss/2022b doesn't seem to work with PyTorch 2.1.2
Update: Seems #20155 works now. So putting this one on hold
Requires: