Ensure FIL CPU can be run without an available GPU#6373
Ensure FIL CPU can be run without an available GPU#6373rapids-bot[bot] merged 2 commits intorapidsai:branch-25.04from
Conversation
Due to an upstream change, CPU FIL was touching the CUDA context in a way that required a GPU to be present in order for it to be used. Until CPU FIL is actually included in the cuML CPU build (which will avoid this problem for anyone using the cuML CPU package), this change ensures that CPU FIL can still be run even if no GPU is available.
|
I'm not sure exactly how we want to go about testing this change (if at all). We only run cuML CPU tests in a GPU-less environment (which make sense), so until FIL is included in CPU builds, this would be the only thing that required a test of GPU cuML in a GPU-less environment. Because of this, I would recommend that we just work toward including FIL in the cuML CPU build and not worry about this for now. |
|
The cause of this is that someone upstream changed something and that lead to the breakage? Was the change a bug/mistake or was it within the "API spec"? Basically, how could we have noticed this? Maybe that helps figure out a way to test this that doesn't need running GPU cuml tests in a CPU only environment. |
|
I'm afraid until we get FIL into the CPU build, we'd have to run GPU FIL in a GPU-less environment to catch something like this. In a CPU build, this whole behavior would be suppressed at compile time. In a GPU build, we're always going to need the code paths that do GPUish things, and the only comprehensive way to ensure they are not touched when we are using the CPU is to actually run them without a GPU and see what breaks. Regardless of where the error originates, I don't see a way around that. Even if we started to do something clever like intercepting calls to the cuda API, I imagine that would require a lot of work and still leave me less confident than if we simply ran tests with I think the best long-term solution here is just to get FIL into the CPU build so we have a way around this problem altogether. When I originally designed FIL CPU, running the GPU build in a GPU-less environment was "out of scope," so it's largely chance that this was supported at all. |
|
/merge |
Due to an upstream change, CPU FIL was touching the CUDA context in a way that required a GPU to be present in order for it to be used. Until CPU FIL is actually included in the cuML CPU build (which will avoid this problem for anyone using the cuML CPU package), this change ensures that CPU FIL can still be run even if no GPU is available.