-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Labels
Description
Dear Developers,
First of all, thank you for your work and the really interesting autosklearn package.
In AutoSklearnRegressor (maybe AutoSklearnClassifier too), when memory_limit is low enough to force autosklearn to decimate the training set, a resampling strategy like GroupKFold fails because the argument groups, which is a vector of group indices for each example in the training set, is not decimated accordingly. In essence, the following line fails:
| if np.shape(self.resampling_strategy_args['groups'])[0] != y.shape[0]: |
because
y.shape[0] refers to the decimated training set, while np.shape(self.resampling_strategy_args['groups'])[0] refers to the original (non decimated) training set.
As a consequence, for large training sets, this problem occurs basically always, preventing to use of group-based resampling strategies.