Ensure FIL CPU can be run without an available GPU by wphicks · Pull Request #6373 · rapidsai/cuml

wphicks · 2025-02-26T20:03:38Z

Due to an upstream change, CPU FIL was touching the CUDA context in a way that required a GPU to be present in order for it to be used. Until CPU FIL is actually included in the cuML CPU build (which will avoid this problem for anyone using the cuML CPU package), this change ensures that CPU FIL can still be run even if no GPU is available.

wphicks · 2025-02-26T20:06:11Z

I'm not sure exactly how we want to go about testing this change (if at all). We only run cuML CPU tests in a GPU-less environment (which make sense), so until FIL is included in CPU builds, this would be the only thing that required a test of GPU cuML in a GPU-less environment. Because of this, I would recommend that we just work toward including FIL in the cuML CPU build and not worry about this for now.

betatim · 2025-02-27T13:29:48Z

The cause of this is that someone upstream changed something and that lead to the breakage? Was the change a bug/mistake or was it within the "API spec"?

Basically, how could we have noticed this? Maybe that helps figure out a way to test this that doesn't need running GPU cuml tests in a CPU only environment.

wphicks · 2025-03-03T14:56:05Z

I'm afraid until we get FIL into the CPU build, we'd have to run GPU FIL in a GPU-less environment to catch something like this. In a CPU build, this whole behavior would be suppressed at compile time. In a GPU build, we're always going to need the code paths that do GPUish things, and the only comprehensive way to ensure they are not touched when we are using the CPU is to actually run them without a GPU and see what breaks. Regardless of where the error originates, I don't see a way around that. Even if we started to do something clever like intercepting calls to the cuda API, I imagine that would require a lot of work and still leave me less confident than if we simply ran tests with CUDA_VISIBLE_DEVICES=''.

I think the best long-term solution here is just to get FIL into the CPU build so we have a way around this problem altogether. When I originally designed FIL CPU, running the GPU build in a GPU-less environment was "out of scope," so it's largely chance that this was supported at all.

wphicks · 2025-03-03T14:57:06Z

/merge

wphicks requested a review from a team as a code owner February 26, 2025 20:03

wphicks requested review from betatim and dantegd February 26, 2025 20:03

github-actions Bot added the Cython / Python Cython or Python issue label Feb 26, 2025

Merge branch 'branch-25.04' into bug-cpu_fil_without_gpu

2c4ece5

wphicks added bug Something isn't working non-breaking Non-breaking change cuml-cpu labels Feb 26, 2025

viclafargue approved these changes Feb 27, 2025

View reviewed changes

betatim mentioned this pull request Feb 27, 2025

Cuda error in test_rf_classification_seed when cudf.pandas is enabled #6372

Closed

rapids-bot Bot merged commit 87ecca0 into rapidsai:branch-25.04 Mar 3, 2025

wphicks mentioned this pull request Mar 3, 2025

[BUG] CPU Experimental FIL throws CUDA Error when run with no available CUDA devices #6134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure FIL CPU can be run without an available GPU#6373

Ensure FIL CPU can be run without an available GPU#6373
rapids-bot[bot] merged 2 commits intorapidsai:branch-25.04from
wphicks:bug-cpu_fil_without_gpu

wphicks commented Feb 26, 2025

Uh oh!

wphicks commented Feb 26, 2025

Uh oh!

betatim commented Feb 27, 2025

Uh oh!

wphicks commented Mar 3, 2025

Uh oh!

wphicks commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wphicks commented Feb 26, 2025

Uh oh!

wphicks commented Feb 26, 2025

Uh oh!

betatim commented Feb 27, 2025

Uh oh!

wphicks commented Mar 3, 2025

Uh oh!

wphicks commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants