Skip to content

Conversation

@ravjot07
Copy link
Contributor

[ENH] Implement online probabilistic regression meta-algorithms

This PR implements two online probabilistic regression meta-algorithms as specified in issue #464:

  1. OnlineBatchMixture: Fits separate copies of regressor on batches and returns Mixture distributions with weights proportional to batch sizes
  2. OnlineBootstrapRemember: Maintains a bootstrap sample from fit data and pools it with new batches, returning Mixture distributions

Closes #464

Changes

New Files

  • skpro/regression/online/_batch_mixture.py: Implementation of OnlineBatchMixture algorithm
  • skpro/regression/online/_bootstrap_remember.py: Implementation of OnlineBootstrapRemember algorithm

Modified Files

  • skpro/regression/online/__init__.py: Added exports for OnlineBatchMixture and OnlineBootstrapRemember

Implementation Details

OnlineBatchMixture

  • Fits separate regressor copies on each batch of data
  • Returns Mixture distribution with weights proportional to number of samples in each batch
  • Supports min_batch_size parameter to control minimum batch size
  • Supports ignore_small_batches parameter:
    • If True: Batches smaller than min_batch_size are ignored
    • If False: Small batches are accumulated until min_batch_size is reached

OnlineBootstrapRemember

  • Remembers a bootstrap sample of size n_remember from fit data
  • At each update, pools remembered sample with new batch data
  • Bootstraps a new remembered sample from pooled data to maintain size n_remember
  • Returns Mixture distribution combining:
    • Predictions from regressor on remembered sample only
    • Predictions from regressor on pooled data (remembered + new batch)
  • Weights are proportional to sample sizes (remembered vs new batch)

Example Usage

from skpro.regression.online import OnlineBatchMixture, OnlineBootstrapRemember
from skpro.regression.residual import ResidualDouble
from sklearn.linear_model import LinearRegression

# OnlineBatchMixture
base_reg = ResidualDouble(LinearRegression())
online_batch = OnlineBatchMixture(
    estimator=base_reg,
    min_batch_size=10,
    ignore_small_batches=False
)
online_batch.fit(X_train_batch1, y_train_batch1)
online_batch.update(X_train_batch2, y_train_batch2)
y_pred_proba = online_batch.predict_proba(X_test)  # Returns Mixture distribution

# OnlineBootstrapRemember
online_bootstrap = OnlineBootstrapRemember(
    estimator=base_reg,
    n_remember=100,
    random_state=42
)
online_bootstrap.fit(X_train, y_train)
online_bootstrap.update(X_new_batch, y_new_batch)
y_pred_proba = online_bootstrap.predict_proba(X_test)  # Returns Mixture distribution

@ravjot07 ravjot07 force-pushed the feat/OPR-meta-algorithms branch from 8e99965 to 5516094 Compare November 23, 2025 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] online probabilistic regression meta-algorithms wishlist

1 participant