-
Notifications
You must be signed in to change notification settings - Fork 36
Reproducer for slow performance with pinned memory #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
msimberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but I'm a bit biased. Importantly, thank you @gppezzi for adding this!
How does this work in terms of the reference? The performance currently does not match the reference, so will the test fail? Or pass with xfail? I don't know enough about reframe to judge how to best handle something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new ReFrame check plus a small CUDA+MPI microbenchmark meant to reproduce slow intranode performance when using pinned host memory on Alps (VCUE-1292).
Changes:
- Introduces a CUDA+MPI reproducer (
intranode_pinned_host_comm.cpp) that times MPI Send/Recv using selectable memory types (host/pinned_host/device). - Adds a minimal CMake build for the reproducer.
- Adds a ReFrame regression test (
MPIIntranodePinned) to build and run the reproducer and extract a timing metric.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
checks/prgenv/cuda/src/intranode_pinned_host_comm.cpp |
New MPI/CUDA reproducer program that allocates selected memory type and measures intranode Send/Recv timings. |
checks/prgenv/cuda/src/CMakeLists.txt |
New CMake build definition for the reproducer executable and required MPI/CUDA runtime links. |
checks/prgenv/cuda/cuda_mpi_intranode_pinned.py |
New ReFrame test that builds/runs the reproducer and reports a performance value. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
-N1 -n2 + host 134217728 = 0: [1:4] time: 0.00298300
-N2 -n2 + host 134217728 = 0: [1:4] time: 0.00592534
-N1 -n2 + pinned_host 134217728 = 0: [1:4] time: 0.0157693
-N2 -n2 + pinned_host 134217728 = 0: [1:4] time: 0.00559603
-N1 -n2 + host 134217728 = 0: [1:4] time: 0.0112828
-N2 -n2 + host 134217728 = 0: [1:4] time: 0.0109703
-N1 -n2 + pinned_host 134217728 = 0: [1:4] time: 0.0113354
-N2 -n2 + pinned_host 134217728 = 0: [1:4] time: 0.00553771so not sure what the reference should be. |
|
cscs-ci run alps-daint-uenv;MY_UENV=prgenv-gnu/25.11:v1 |
@jpcoles suggested that we keep this as a failure, to remind us to ping the vendor about the issue. If we close the case, then we can disable the test... |
Is this a case where xfail would be useful? There's absolutely nothing we can do currently to fix this test, so having it report failures on reframe pipelines all the time is just noise IMO. We ideally want failures to report something that we actually need to act on. If we mark it xfail it's allowed to fail without creating noise, but if the issue is fixed the test should become xpass and cause a pipeline to fail, so we know when the behaviour has changed. If/when that happens, we can then require that the test passes to make sure it doesn't regress. Does that sound reasonable? I'm not 100% sure this is how xfail works in reframe, but it's how I expect it to work at least. |
Uh oh!
There was an error while loading. Please reload this page.