Use less GPU memory in test_managed_alloc_driver_undersubscribe.#188
Use less GPU memory in test_managed_alloc_driver_undersubscribe.#188bdice merged 2 commits intoNVIDIA:mainfrom
Conversation
|
Root cause: AFAIS, there's no specific reason why we need to allocate that much for the under-subscribed test. Reducing to 10% of gpu memory fixes the issue. |
|
Thanks for the fix @isVoid. Since I created this PR, GitHub does not allow me to approve it, but I think your changes are correct and I would approve it -- feel free to approve it on my behalf, if you'd like. Currently there are no required reviews on this repository, so this can be merged. I'll leave it to your discretion on when to merge. edit: I updated the description based on your comment. |
|
I do not have power here (yet 😉), but the fix LGTM as well |
|
I’ll go ahead and merge. It seems we have consensus on this small fix. |
…DIA#188) * Test h100 GPUs. * limit the size of memory allocated --------- Co-authored-by: isVoid <isVoid@users.noreply.github.com>
- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (NVIDIA#155) - reinstate test (NVIDIA#226) - Restore PR NVIDIA#185 (Stop Certain Driver API Discovery for "v2") (NVIDIA#223) - Report NVRTC builtin operation failures to the user (NVIDIA#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (NVIDIA#145) - Test CUDA 12.8. (NVIDIA#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (NVIDIA#189) - Migrate code style to ruff (NVIDIA#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (NVIDIA#188) - Update workflows to always use proxy cache. (NVIDIA#191)
- Locate nvvm, libdevice, nvrtc, and cudart from nvidia-*-cu12 wheels (#155) - reinstate test (#226) - Restore PR #185 (Stop Certain Driver API Discovery for "v2") (#223) - Report NVRTC builtin operation failures to the user (#196) - Add Module Setup and Teardown Callback to Linkable Code Interface (#145) - Test CUDA 12.8. (#187) - Ensure RTC Bindings Clamp to the Maximum Supported CC (#189) - Migrate code style to ruff (#170) - Use less GPU memory in test_managed_alloc_driver_undersubscribe. (#188) - Update workflows to always use proxy cache. (#191)
Resolves #184.
(numba.cuda.tests.cudadrv.test_managed_alloc.TestManagedAlloc.test_managed_alloc_driver_undersubscribe)allocates 50% of managed h100 memory (40GB), which causes the test process to be killed.AFAIS, there's no specific reason why we need to allocate that much for the under-subscribed test. Reducing to 10% of gpu memory fixes the issue.