Skip to content

Conversation

@VibhuJawa
Copy link
Collaborator

@VibhuJawa VibhuJawa commented Nov 23, 2021

Add support and tests for cuML and XGBoost

This PR addresses #309

TODO:

  • Ensure efficient path for non distributed models and allow single GPU xgboost to succeed

     For more context around this see issue: rapidsai/cuml#4406 .
    
     We were previously training on client which is:
       a. Very inefficient and possibly problematic in multi-node clusters and heterogeneous setup.
       b. Training non distributed xgboost models on dask collections is not supported.
    
  • Test for singe GPU cuml model

  • Test for multi gpu cuml model

  • Test for singe gpu xgboost model

  • Test for multi gpu xgboost model

  • Update GPU-CI environment with the right libraries
    See PR: https://github.com/rapidsai/dask-build-environment/pull/16/files

Follow Up work to enable predict with multi GPU cuML models:

@codecov-commenter
Copy link

codecov-commenter commented Nov 23, 2021

Codecov Report

Merging #330 (2195a06) into main (96524ee) will increase coverage by 0.01%.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #330      +/-   ##
==========================================
+ Coverage   95.69%   95.70%   +0.01%     
==========================================
  Files          65       65              
  Lines        2854     2863       +9     
  Branches      534      536       +2     
==========================================
+ Hits         2731     2740       +9     
+ Misses         75       74       -1     
- Partials       48       49       +1     
Impacted Files Coverage Δ
dask_sql/physical/rel/custom/create_model.py 93.22% <88.23%> (-2.94%) ⬇️
dask_sql/physical/utils/sort.py 90.62% <0.00%> (+7.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96524ee...2195a06. Read the comment docs.

@VibhuJawa VibhuJawa changed the title [WIP]Add tests with cuML and XGBoost [WIP]Add support and tests for cuML and XGBoost Nov 24, 2021
@VibhuJawa
Copy link
Collaborator Author

CC: @charlesbluca , How do i go about adding xgboost and cuML to the gpu-CI we have setup ?

@charlesbluca
Copy link
Collaborator

You can open a PR adding these packages to the environment created in this dockerfile:

https://github.com/rapidsai/dask-build-environment/blob/main/dask_sql.Dockerfile

@VibhuJawa VibhuJawa changed the title [WIP]Add support and tests for cuML and XGBoost [REVIEW]Add support and tests for cuML and XGBoost Nov 30, 2021
@VibhuJawa VibhuJawa marked this pull request as ready for review November 30, 2021 19:01
@VibhuJawa
Copy link
Collaborator Author

@charlesbluca , This is now ready for review.

This PR is dependent on https://github.com/rapidsai/dask-build-environment/pull/16/files

Comment on lines +179 to +183
X_d = X.repartition(npartitions=1).to_delayed()
if y is not None:
y_d = y.repartition(npartitions=1).to_delayed()
else:
y_d = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context around this see issue: rapidsai/cuml#4406 .

We were previously training on client which is:

a. Very inefficient and possibly problematic in multi-node clusters and heterogeneous setup.
b. Training non distributed xgboost models on dask collections is not supported.

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work here @VibhuJawa 😄 a couple comments, I'm not the most knowledgeable on Dask ML stuff so would feel good getting a second review on this:

Copy link
Collaborator

@ChrisJar ChrisJar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@charlesbluca charlesbluca added the blocked Blocked by work in another pull request label Dec 2, 2021
Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this. but generally things look good here! Happy to merge this once cuML / XGBoost are successfully added to gpuCI and tests are passing

@VibhuJawa
Copy link
Collaborator Author

@GPUtester rerun tests .

Copy link
Collaborator

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we need to mark these as fixtures to expose them to tests?

@charlesbluca
Copy link
Collaborator

rerun tests

@charlesbluca
Copy link
Collaborator

rerun tests

@charlesbluca charlesbluca removed the blocked Blocked by work in another pull request label Dec 6, 2021
@charlesbluca charlesbluca merged commit 1f48686 into dask-contrib:main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants