Skip to content

[FEA] Implement kfold.#7163

Closed
trivialfis wants to merge 12 commits intorapidsai:branch-25.10from
trivialfis:fea-kfold-rebase
Closed

[FEA] Implement kfold.#7163
trivialfis wants to merge 12 commits intorapidsai:branch-25.10from
trivialfis:fea-kfold-rebase

Conversation

@trivialfis
Copy link
Copy Markdown
Member

@trivialfis trivialfis commented Aug 30, 2025

Replaces #5940 .

Closes #7088

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Aug 30, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Aug 30, 2025
@trivialfis trivialfis added 2 - In Progress Currenty a work in progress feature request New feature or request New Algorithm For tracking new algorithms that will be added to our existing collection labels Aug 30, 2025
@trivialfis trivialfis added the non-breaking Non-breaking change label Sep 30, 2025
@trivialfis trivialfis marked this pull request as ready for review September 30, 2025 03:31
@trivialfis trivialfis requested a review from a team as a code owner September 30, 2025 03:31
@trivialfis trivialfis requested a review from jcrist September 30, 2025 03:31
@trivialfis trivialfis changed the title [wip][FEA] Implement kfold. [FEA] Implement kfold. Sep 30, 2025
@trivialfis trivialfis added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Sep 30, 2025
Copy link
Copy Markdown
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I left a few review comments and suggestions

Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/tests/test_stratified_kfold.py Outdated

assert tr_idx.shape[0] + te_idx.shape[0] == n_samples
fold_size = X.shape[0] // n_splits
assert te_idx.shape[0] == fold_size or te_idx.shape[0] == fold_size + 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this more precise, maybe by hard coding (with a short comment explaining why) the expected size of the folds. I think this is worth it because the off by one error here is one of the important things to check for

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a parametrized test, probably not very nice to have a hardcoded expectation for every case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. Maybe then we should have an if statement for the size check. From memory it is the first n_samples % n_splits folds that should have a size of fold_size + 1

Comment thread python/cuml/tests/test_stratified_kfold.py
Comment thread python/cuml/tests/test_stratified_kfold.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Sep 30, 2025

This should be targeting branch-25.12.

I think we can consider it for 25.10, but I'd like the comments by @betatim to be addressed.

- remove custom random state handling.
- remove type hints.
- update doc strings.
@trivialfis trivialfis changed the base branch from branch-25.10 to branch-25.12 September 30, 2025 18:20
@trivialfis trivialfis requested review from a team as code owners September 30, 2025 18:20
@trivialfis trivialfis requested a review from bdice September 30, 2025 18:20
@trivialfis trivialfis changed the base branch from branch-25.12 to branch-25.10 September 30, 2025 18:21
@csadorf csadorf removed the request for review from a team September 30, 2025 18:24
@csadorf csadorf removed request for a team and bdice September 30, 2025 18:24
@trivialfis
Copy link
Copy Markdown
Member Author

I think we can consider it for 25.10, but I'd like the comments by @betatim to be addressed.

Thank you. Kept the targeted branch unchanged.

@betatim Thank you for the detailed comments. I have addressed most of the issues. I kept the existing train_test_split and stratified kfold unchanged. But in the future, they might need some cleanup to meet the new cuml conventions.

Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py
Comment thread python/cuml/cuml/model_selection/_split.py Outdated
Comment thread python/cuml/cuml/model_selection/_split.py
@trivialfis
Copy link
Copy Markdown
Member Author

Noted. I mainly used the existing stratified k-fold as a reference. I will use the sklearn kfold instead.

@betatim
Copy link
Copy Markdown
Member

betatim commented Oct 1, 2025

@betatim Thank you for the detailed comments. I have addressed most of the issues. I kept the existing train_test_split and stratified kfold unchanged. But in the future, they might need some cleanup to meet the new cuml conventions.

Thanks for being efficient and nice with addressing them! I think it is a good idea to leave clean up/making things more uniform to a separate PR. Having a PR that changes/adds new things and does clean up is trickier to review because you need to double check if this is just a "moved things around" change or a new addition.

What do you have in mind with "the new cuml conventions"?

Comment thread python/cuml/cuml/model_selection/_split.py Outdated
@trivialfis
Copy link
Copy Markdown
Member Author

trivialfis commented Oct 1, 2025

What do you have in mind with "the new cuml conventions"?

I wrote this PR a long time ago. I just used the existing stratified kfold as an example. But it did not meet the current review standard. ;-)

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Oct 1, 2025

Please re-open this for branch-25.12.

@csadorf csadorf closed this Oct 1, 2025
@trivialfis trivialfis mentioned this pull request Oct 2, 2025
1 task
rapids-bot Bot pushed a commit that referenced this pull request Oct 31, 2025
#7163 rebased onto branch-25.12.

Closes #7088

Continuing the previous PR:
- I have confirmed that the document from the base class `get_n_splits` is correctly inherited for the `KFold`.
- I did not add the `StratifiedKFold` to the Sphinx doc as it did not build successfully. And as previously suggested, changing the stratified kfold should not be bundled into this PR.
- Added a more precise test for the fold size.

*todos*
- [x] #7163 (comment)

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)
  - Divye Gala (https://github.com/divyegala)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #7296
vardhan30016 pushed a commit to vardhan30016/cuml that referenced this pull request Nov 7, 2025
rapidsai#7163 rebased onto branch-25.12.

Closes rapidsai#7088

Continuing the previous PR:
- I have confirmed that the document from the base class `get_n_splits` is correctly inherited for the `KFold`.
- I did not add the `StratifiedKFold` to the Sphinx doc as it did not build successfully. And as previously suggested, changing the stratified kfold should not be bundled into this PR.
- Added a more precise test for the fold size.

*todos*
- [x] rapidsai#7163 (comment)

Authors:
  - Jiaming Yuan (https://github.com/trivialfis)
  - Divye Gala (https://github.com/divyegala)
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: rapidsai#7296
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue feature request New feature or request New Algorithm For tracking new algorithms that will be added to our existing collection non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] KFold for cuML & cuDF

5 participants