Use less GPU memory in test_managed_alloc_driver_undersubscribe. by bdice · Pull Request #188 · NVIDIA/numba-cuda

bdice · 2025-04-02T02:48:28Z

Resolves #184.

(numba.cuda.tests.cudadrv.test_managed_alloc.TestManagedAlloc.test_managed_alloc_driver_undersubscribe) allocates 50% of managed h100 memory (40GB), which causes the test process to be killed.

AFAIS, there's no specific reason why we need to allocate that much for the under-subscribed test. Reducing to 10% of gpu memory fixes the issue.

isVoid · 2025-04-07T19:16:56Z

Root cause:
(numba.cuda.tests.cudadrv.test_managed_alloc.TestManagedAlloc.test_managed_alloc_driver_undersubscribe) allocates 50% of managed h100 memory (40GB), which causes the test process to be killed.

AFAIS, there's no specific reason why we need to allocate that much for the under-subscribed test. Reducing to 10% of gpu memory fixes the issue.

bdice · 2025-04-07T23:50:56Z

Thanks for the fix @isVoid. Since I created this PR, GitHub does not allow me to approve it, but I think your changes are correct and I would approve it -- feel free to approve it on my behalf, if you'd like. Currently there are no required reviews on this repository, so this can be merged. I'll leave it to your discretion on when to merge.

edit: I updated the description based on your comment.

kkraus14 · 2025-04-08T01:48:51Z

I do not have power here (yet 😉), but the fix LGTM as well

bdice · 2025-04-08T01:49:45Z

I’ll go ahead and merge. It seems we have consensus on this small fix.

…DIA#188) * Test h100 GPUs. * limit the size of memory allocated --------- Co-authored-by: isVoid <isVoid@users.noreply.github.com>

- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (NVIDIA#155) - reinstate test (NVIDIA#226) - Restore PR NVIDIA#185 (Stop Certain Driver API Discovery for "v2") (NVIDIA#223) - Report NVRTC builtin operation failures to the user (NVIDIA#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (NVIDIA#145) - Test CUDA 12.8. (NVIDIA#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (NVIDIA#189) - Migrate code style to ruff (NVIDIA#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (NVIDIA#188) - Update workflows to always use proxy cache. (NVIDIA#191)

- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (#155) - reinstate test (#226) - Restore PR #185 (Stop Certain Driver API Discovery for "v2") (#223) - Report NVRTC builtin operation failures to the user (#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (#145) - Test CUDA 12.8. (#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (#189) - Migrate code style to ruff (#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (#188) - Update workflows to always use proxy cache. (#191)

Test h100 GPUs.

5032137

bdice marked this pull request as draft April 2, 2025 02:48

limit the size of memory allocated

4e0fab7

bdice marked this pull request as ready for review April 7, 2025 23:49

bdice changed the title ~~Test h100 GPUs.~~ Test H100 GPUs. Apr 7, 2025

bdice changed the title ~~Test H100 GPUs.~~ Use less GPU memory in test_managed_alloc_driver_undersubscribe. Apr 7, 2025

isVoid approved these changes Apr 7, 2025

View reviewed changes

kkraus14 approved these changes Apr 8, 2025

View reviewed changes

bdice merged commit b32dfdb into NVIDIA:main Apr 8, 2025
35 checks passed

gmarkall mentioned this pull request Apr 22, 2025

Bump version to 0.9.0 #229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use less GPU memory in test_managed_alloc_driver_undersubscribe.#188

Use less GPU memory in test_managed_alloc_driver_undersubscribe.#188
bdice merged 2 commits intoNVIDIA:mainfrom
bdice:test-h100

bdice commented Apr 2, 2025 •

edited

Loading

Uh oh!

isVoid commented Apr 7, 2025

Uh oh!

bdice commented Apr 7, 2025 •

edited

Loading

Uh oh!

kkraus14 commented Apr 8, 2025

Uh oh!

bdice commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bdice commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isVoid commented Apr 7, 2025

Uh oh!

bdice commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkraus14 commented Apr 8, 2025

Uh oh!

bdice commented Apr 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bdice commented Apr 2, 2025 •

edited

Loading

bdice commented Apr 7, 2025 •

edited

Loading