Skip to content

Getting only TIMEOUT for PredefinedSplit #1274

@mereldawu

Description

@mereldawu

Describe the bug

When passing PredefinedSplit as a resampling strategy, the result only shows timeout for even a small dataset. By using the default configuration, auto-sklearn can create successful trials in a couple of seconds.

To Reproduce

This is the minimal code I can come up with, based on the example here.

import pandas as pd
import numpy as np
import autosklearn.metrics
from sklearn.model_selection import PredefinedSplit, train_test_split
from benatar.models.automl import  AutoSklearn

# Using credit card public dataset to demonstrate the problem
df = pd.read_csv("https://raw.githubusercontent.com/irenebenedetto/default-of-credit-card-clients/master/dataset/credit_cards_dataset.csv")
X_train, X_test = train_test_split(
    df, test_size=0.2, random_state=42
)
y_train = X_train.pop(X_train.columns[-1])

# Using a random column to create validation set, it's meaningless but also just to demonstrate the point
resampling_strategy = PredefinedSplit(
    test_fold=np.where(X_train.to_numpy()[:, 4] < np.mean(X_train.to_numpy()[:, 4]))[0]
)

autosk = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=600,
    per_run_time_limit=200,
    tmp_folder="./tmp/autosklearn",
    disable_evaluator_output=False,
    resampling_strategy=resampling_strategy,
    metric=autosklearn.metrics.accuracy,
    delete_tmp_folder_after_terminate=False,
    seed=42
)

autosk.fit(X_train, y_train)

By commenting out the resampling_strategy line, the trials run successfully.

I've also tried to increase the time_left_for_this_task and per_run_time_limit both to 6000, still only got TIMEOUT.

I also tried to run the example code and it ran with successfully generated trials.

I'm not sure if the issue is the dataset, how I'm using PredefinedSplit or?

Expected behavior

Generate multiple successful trials.

Actual behavior, stacktrace or logfile

Result from sprint statistics:
auto-sklearn results:
Dataset name: 1e6334d4-3831-11ec-9a9c-0255ac100090
Metric: accuracy
Number of target algorithm runs: 12
Number of successful target algorithm runs: 0
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 12
Number of target algorithms that exceeded the memory limit: 0

Logfile uploaded.

Environment and installation:

Please give details about your installation:

  • OS: Ubuntu 20.04.2 LTS (Focal Fossa) - a pod inside Kubeflow cluster
  • Is your installation in a virtual environment or conda environment: Normal python in a Kubeflow notebook
  • Python version: 3.7.1
  • Auto-sklearn version: 0.13.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions