option to re-use cv splits during tuning by janrth · Pull Request #545 · Nixtla/mlforecast

janrth · 2025-12-31T17:03:12Z

At the moment the CV splits are created newly in each tuning trial, which can be slow for large datasets.
This PR adds an option to reuse CV splits across tuning trials.

When enabled, the splits are computed once in auto.py and passed into the optimization objective, where they are reused for all tuning trials. The default behavior is unchanged: if the option is disabled, CV splits are still generated inside each trial as before.

The original behavior is intentionally kept because this introduces a RAM vs CPU trade-off. Reusing splits can significantly reduce runtime when running many trials, but it keeps all train/validation splits in memory for the duration of the tuning run, which may increase peak memory usage on very large datasets.

To ensure correctness, a test was added using a deterministic model, verifying that predictions are identical when reusing CV splits versus recomputing them each trial.

When tested on the example below I saw a speed-up of 1.14x when re-using CV compared to the current implementation:

from utilsforecast.data import generate_series

series = generate_series(n_series=20000, freq='D', min_length=722, max_length=725, equal_ends=True)


# Setting up model
def my_lgb_config(trial: optuna.Trial):
    return {
        'learning_rate': 0.05,
        'verbosity': -1,
        'num_leaves': trial.suggest_int('num_leaves', 127, 128, log=True),
    }
    
def my_fit_config(trial: optuna.Trial):
    return {'dropna':False
    }
    
def my_init_config(trial: optuna.Trial):
    return {
        'lags': [24 * i for i in range(1, 3)],  # this won't be tuned
        'lag_transforms' : {
    1: [ExponentiallyWeightedMean(alpha=0.3),
        RollingMean(window_size=7, min_samples=1),
         RollingMean(window_size=21, min_samples=1),
          RollingMean(window_size=28, min_samples=1),
           RollingMean(window_size=50, min_samples=1),
        ],
}}

my_lgb = AutoModel(
    model=lgb.LGBMRegressor(),
    config=my_lgb_config,
)

import time

# -------------------------------
# Fitting model WITHOUT reusing CV splits
# -------------------------------
t0 = time.perf_counter()

auto_mlf_no_reuse = AutoMLForecast(
    models={"my_lgb": my_lgb},
    freq="D",
    season_length=24,
    fit_config=my_fit_config,
    init_config=my_init_config,
    reuse_cv_splits=False,
).fit(
    series,
    n_windows=15,
    h=horizon,
    num_samples=10,
)

t_no_reuse = time.perf_counter() - t0
print(f"Fit time without CV split reuse: {t_no_reuse:.2f} seconds")

# -------------------------------
# Fitting model WITH reusing CV splits
# -------------------------------
t0 = time.perf_counter()

auto_mlf_reuse = AutoMLForecast(
    models={"my_lgb": my_lgb},
    freq="D",
    season_length=24,
    fit_config=my_fit_config,
    init_config=my_init_config,
    reuse_cv_splits=True,   # <-- important
).fit(
    series,
    n_windows=15,
    h=horizon,
    num_samples=10,
)

t_reuse = time.perf_counter() - t0
print(f"Fit time with CV split reuse: {t_reuse:.2f} seconds")

# -------------------------------
# Comparison
# -------------------------------
speedup = t_no_reuse / t_reuse
delta = t_no_reuse - t_reuse

print(f"Speedup: {speedup:.2f}×")
print(f"Time saved: {delta:.2f} seconds ({delta / t_no_reuse * 100:.1f}%)")

Description

Solves #538

Checklist:

This PR has a meaningful title and a clear description.
The tests pass.
All linting tasks pass.
The notebooks are clean.

…each tuning trail

janrth · 2025-12-31T17:05:01Z

@nasaul Here is an implementation, which re-uses the CV splits to increase tuning speed.

codspeed-hq · 2025-12-31T17:10:13Z

Merging this PR will not alter performance

Summary

✅ 12 untouched benchmarks

_{Comparing janrth:enhancement/reuse_backtest_splits (c617ffc) with main (48f7d03)}

nasaul

Good implementation @janrth! Just minor comments to solve before merging.

nasaul

LGTM

option to re-use cv splits so they are not created every time during …

dfa485d

…each tuning trail

clean imports

584b1c6

nasaul reviewed Jan 5, 2026

View reviewed changes

Comment thread mlforecast/auto.py

Comment thread mlforecast/optimization.py Outdated

janrth and others added 2 commits January 7, 2026 14:47

implement feedback about doc string and variable name

0adbd1a

Merge branch 'main' into enhancement/reuse_backtest_splits

c617ffc

nasaul approved these changes Jan 8, 2026

View reviewed changes

nasaul merged commit 7526c56 into Nixtla:main Jan 8, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to re-use cv splits during tuning#545

option to re-use cv splits during tuning#545
nasaul merged 4 commits intoNixtla:mainfrom
janrth:enhancement/reuse_backtest_splits

janrth commented Dec 31, 2025 •

edited

Loading

Uh oh!

janrth commented Dec 31, 2025

Uh oh!

codspeed-hq Bot commented Dec 31, 2025 •

edited

Loading

Uh oh!

nasaul left a comment

Uh oh!

Uh oh!

Uh oh!

nasaul left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janrth commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

janrth commented Dec 31, 2025

Uh oh!

codspeed-hq Bot commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Summary

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nasaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

janrth commented Dec 31, 2025 •

edited

Loading

codspeed-hq Bot commented Dec 31, 2025 •

edited

Loading