Skip to content

option to re-use cv splits during tuning#545

Merged
nasaul merged 4 commits intoNixtla:mainfrom
janrth:enhancement/reuse_backtest_splits
Jan 8, 2026
Merged

option to re-use cv splits during tuning#545
nasaul merged 4 commits intoNixtla:mainfrom
janrth:enhancement/reuse_backtest_splits

Conversation

@janrth
Copy link
Copy Markdown
Contributor

@janrth janrth commented Dec 31, 2025

At the moment the CV splits are created newly in each tuning trial, which can be slow for large datasets.
This PR adds an option to reuse CV splits across tuning trials.

When enabled, the splits are computed once in auto.py and passed into the optimization objective, where they are reused for all tuning trials. The default behavior is unchanged: if the option is disabled, CV splits are still generated inside each trial as before.

The original behavior is intentionally kept because this introduces a RAM vs CPU trade-off. Reusing splits can significantly reduce runtime when running many trials, but it keeps all train/validation splits in memory for the duration of the tuning run, which may increase peak memory usage on very large datasets.

To ensure correctness, a test was added using a deterministic model, verifying that predictions are identical when reusing CV splits versus recomputing them each trial.

When tested on the example below I saw a speed-up of 1.14x when re-using CV compared to the current implementation:

from utilsforecast.data import generate_series

series = generate_series(n_series=20000, freq='D', min_length=722, max_length=725, equal_ends=True)


# Setting up model
def my_lgb_config(trial: optuna.Trial):
    return {
        'learning_rate': 0.05,
        'verbosity': -1,
        'num_leaves': trial.suggest_int('num_leaves', 127, 128, log=True),
    }
    
def my_fit_config(trial: optuna.Trial):
    return {'dropna':False
    }
    
def my_init_config(trial: optuna.Trial):
    return {
        'lags': [24 * i for i in range(1, 3)],  # this won't be tuned
        'lag_transforms' : {
    1: [ExponentiallyWeightedMean(alpha=0.3),
        RollingMean(window_size=7, min_samples=1),
         RollingMean(window_size=21, min_samples=1),
          RollingMean(window_size=28, min_samples=1),
           RollingMean(window_size=50, min_samples=1),
        ],
}}

my_lgb = AutoModel(
    model=lgb.LGBMRegressor(),
    config=my_lgb_config,
)

import time

# -------------------------------
# Fitting model WITHOUT reusing CV splits
# -------------------------------
t0 = time.perf_counter()

auto_mlf_no_reuse = AutoMLForecast(
    models={"my_lgb": my_lgb},
    freq="D",
    season_length=24,
    fit_config=my_fit_config,
    init_config=my_init_config,
    reuse_cv_splits=False,
).fit(
    series,
    n_windows=15,
    h=horizon,
    num_samples=10,
)

t_no_reuse = time.perf_counter() - t0
print(f"Fit time without CV split reuse: {t_no_reuse:.2f} seconds")

# -------------------------------
# Fitting model WITH reusing CV splits
# -------------------------------
t0 = time.perf_counter()

auto_mlf_reuse = AutoMLForecast(
    models={"my_lgb": my_lgb},
    freq="D",
    season_length=24,
    fit_config=my_fit_config,
    init_config=my_init_config,
    reuse_cv_splits=True,   # <-- important
).fit(
    series,
    n_windows=15,
    h=horizon,
    num_samples=10,
)

t_reuse = time.perf_counter() - t0
print(f"Fit time with CV split reuse: {t_reuse:.2f} seconds")

# -------------------------------
# Comparison
# -------------------------------
speedup = t_no_reuse / t_reuse
delta = t_no_reuse - t_reuse

print(f"Speedup: {speedup:.2f}×")
print(f"Time saved: {delta:.2f} seconds ({delta / t_no_reuse * 100:.1f}%)")

Description

Solves #538

Checklist:

  • This PR has a meaningful title and a clear description.
  • The tests pass.
  • All linting tasks pass.
  • The notebooks are clean.

@janrth
Copy link
Copy Markdown
Contributor Author

janrth commented Dec 31, 2025

@nasaul Here is an implementation, which re-uses the CV splits to increase tuning speed.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Dec 31, 2025

Merging this PR will not alter performance

Summary

✅ 12 untouched benchmarks


Comparing janrth:enhancement/reuse_backtest_splits (c617ffc) with main (48f7d03)

Open in CodSpeed

Copy link
Copy Markdown
Contributor

@nasaul nasaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good implementation @janrth! Just minor comments to solve before merging.

Comment thread mlforecast/auto.py
Comment thread mlforecast/optimization.py Outdated
Copy link
Copy Markdown
Contributor

@nasaul nasaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nasaul nasaul merged commit 7526c56 into Nixtla:main Jan 8, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants