Skip to content

Add Out-of-Bag (OOB) Score Support to RandomForest#7401

Merged
rapids-bot[bot] merged 41 commits intorapidsai:mainfrom
csadorf:fea/support-rf-oob-scores
Nov 3, 2025
Merged

Add Out-of-Bag (OOB) Score Support to RandomForest#7401
rapids-bot[bot] merged 41 commits intorapidsai:mainfrom
csadorf:fea/support-rf-oob-scores

Conversation

@csadorf
Copy link
Copy Markdown
Contributor

@csadorf csadorf commented Oct 28, 2025

Summary

Implements out-of-bag (OOB) scoring for RandomForestClassifier and RandomForestRegressor, enabling users to estimate model performance without requiring a separate validation set.

Closes #7395

Changes

C++ Layer

  • Modified fit() functions to accept optional bootstrap_masks parameter for storing per-tree bootstrap sample indicators
  • Updated RandomForest::fit() to capture and store bootstrap masks when oob_score=True

Python Layer

  • Added oob_score parameter (boolean only) to Random Forest estimators
  • Implemented _compute_oob_score() method that leverages FIL's predict_per_tree() for efficient OOB predictions
  • Added oob_score_ and oob_decision_function_ (or oob_prediction_) attributes
  • Validates that oob_score is boolean (custom scorer functions not supported)
  • Added proper attribute transfer for pickle and CPU interop

Metrics

  • Classifier: Uses accuracy score on OOB predictions
  • Regressor: Uses R² score on OOB predictions

Limitations

  • Custom scorer functions (callable oob_score) are not supported - only boolean values accepted
  • Multi-output targets not supported for OOB scoring

Testing

Added comprehensive tests covering:

  • Binary and multi-class classification OOB scoring
  • Regression OOB scoring
  • Error handling for invalid configurations
  • Comparison with scikit-learn baseline

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Oct 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions Bot added Cython / Python Cython or Python issue CUDA/C++ labels Oct 28, 2025
@csadorf csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 28, 2025
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Oct 28, 2025

/ok to test ba0c33a

@csadorf csadorf force-pushed the fea/support-rf-oob-scores branch from 83224e2 to bb46ce7 Compare October 30, 2025 01:15
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Oct 30, 2025

/ok to test 25ab340

@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Oct 30, 2025

/ok to test c349e02

@csadorf csadorf force-pushed the fea/support-rf-oob-scores branch from c349e02 to 4dcfd8c Compare October 30, 2025 15:36
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Oct 30, 2025

/ok to test 4dcfd8c

Copy link
Copy Markdown
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! LGTM, just some minor suggestions for the Python portion.

Comment thread python/cuml/cuml/ensemble/randomforest_common.pyx Outdated
Comment thread python/cuml/tests/test_sklearn_import_export.py Outdated
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work! Left two small nits, but otherwise :shipit:!

Comment thread python/cuml/cuml/accel/_wrappers/sklearn/ensemble.py Outdated
Comment thread python/cuml/cuml/ensemble/randomforest_common.pyx Outdated
@csadorf
Copy link
Copy Markdown
Contributor Author

csadorf commented Nov 3, 2025

/merge

@rapids-bot rapids-bot Bot merged commit d5cfd77 into rapidsai:main Nov 3, 2025
106 checks passed
@csadorf csadorf deleted the fea/support-rf-oob-scores branch November 3, 2025 23:41
vardhan30016 pushed a commit to vardhan30016/cuml that referenced this pull request Nov 7, 2025
## Summary

Implements out-of-bag (OOB) scoring for `RandomForestClassifier` and `RandomForestRegressor`, enabling users to estimate model performance without requiring a separate validation set.

Closes rapidsai#7395

## Changes

### C++ Layer
- Modified `fit()` functions to accept optional `bootstrap_masks` parameter for storing per-tree bootstrap sample indicators
- Updated `RandomForest::fit()` to capture and store bootstrap masks when `oob_score=True`

### Python Layer
- Added `oob_score` parameter (boolean only) to Random Forest estimators
- Implemented `_compute_oob_score()` method that leverages FIL's `predict_per_tree()` for efficient OOB predictions
- Added `oob_score_` and `oob_decision_function_` (or `oob_prediction_`) attributes
- Validates that `oob_score` is boolean (custom scorer functions not supported)
- Added proper attribute transfer for pickle and CPU interop

### Metrics
- **Classifier**: Uses accuracy score on OOB predictions
- **Regressor**: Uses R² score on OOB predictions

## Limitations

- Custom scorer functions (callable `oob_score`) are not supported - only boolean values accepted
- Multi-output targets not supported for OOB scoring

## Testing

Added comprehensive tests covering:
- Binary and multi-class classification OOB scoring
- Regression OOB scoring
- Error handling for invalid configurations
- Comparison with scikit-learn baseline

Authors:
  - Simon Adorf (https://github.com/csadorf)

Approvers:
  - Victor Lafargue (https://github.com/viclafargue)
  - Divye Gala (https://github.com/divyegala)
  - Jim Crist-Harif (https://github.com/jcrist)

URL: rapidsai#7401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Add support for Out-of-Bag (OOB) Scores for cuML Random Forest

5 participants