[FEA] Implement kfold.#7163
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
76d4500 to
c172ca3
Compare
betatim
left a comment
There was a problem hiding this comment.
Thanks for working on this. I left a few review comments and suggestions
|
|
||
| assert tr_idx.shape[0] + te_idx.shape[0] == n_samples | ||
| fold_size = X.shape[0] // n_splits | ||
| assert te_idx.shape[0] == fold_size or te_idx.shape[0] == fold_size + 1 |
There was a problem hiding this comment.
I'd make this more precise, maybe by hard coding (with a short comment explaining why) the expected size of the folds. I think this is worth it because the off by one error here is one of the important things to check for
There was a problem hiding this comment.
We have a parametrized test, probably not very nice to have a hardcoded expectation for every case.
There was a problem hiding this comment.
That is true. Maybe then we should have an if statement for the size check. From memory it is the first n_samples % n_splits folds that should have a size of fold_size + 1
|
I think we can consider it for 25.10, but I'd like the comments by @betatim to be addressed. |
- remove custom random state handling. - remove type hints. - update doc strings.
Thank you. Kept the targeted branch unchanged. @betatim Thank you for the detailed comments. I have addressed most of the issues. I kept the existing train_test_split and stratified kfold unchanged. But in the future, they might need some cleanup to meet the new cuml conventions. |
|
Noted. I mainly used the existing stratified k-fold as a reference. I will use the sklearn kfold instead. |
Thanks for being efficient and nice with addressing them! I think it is a good idea to leave clean up/making things more uniform to a separate PR. Having a PR that changes/adds new things and does clean up is trickier to review because you need to double check if this is just a "moved things around" change or a new addition. What do you have in mind with "the new cuml conventions"? |
I wrote this PR a long time ago. I just used the existing stratified kfold as an example. But it did not meet the current review standard. ;-) |
|
Please re-open this for branch-25.12. |
#7163 rebased onto branch-25.12. Closes #7088 Continuing the previous PR: - I have confirmed that the document from the base class `get_n_splits` is correctly inherited for the `KFold`. - I did not add the `StratifiedKFold` to the Sphinx doc as it did not build successfully. And as previously suggested, changing the stratified kfold should not be bundled into this PR. - Added a more precise test for the fold size. *todos* - [x] #7163 (comment) Authors: - Jiaming Yuan (https://github.com/trivialfis) - Divye Gala (https://github.com/divyegala) - Simon Adorf (https://github.com/csadorf) Approvers: - Simon Adorf (https://github.com/csadorf) URL: #7296
rapidsai#7163 rebased onto branch-25.12. Closes rapidsai#7088 Continuing the previous PR: - I have confirmed that the document from the base class `get_n_splits` is correctly inherited for the `KFold`. - I did not add the `StratifiedKFold` to the Sphinx doc as it did not build successfully. And as previously suggested, changing the stratified kfold should not be bundled into this PR. - Added a more precise test for the fold size. *todos* - [x] rapidsai#7163 (comment) Authors: - Jiaming Yuan (https://github.com/trivialfis) - Divye Gala (https://github.com/divyegala) - Simon Adorf (https://github.com/csadorf) Approvers: - Simon Adorf (https://github.com/csadorf) URL: rapidsai#7296
Replaces #5940 .
Closes #7088