-
Notifications
You must be signed in to change notification settings - Fork 886
Closed
Description
Hey @rasbt
I'm strangling to find the best features & tuning using SequentialFeatureSelector and GridSearchCV.
I would like to test for each of my param_grid the best combinaison of features.
Code to reproduce :
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np
RANDOM_SEED = 42
boston = load_boston()
data, y = load_boston(return_X_y=True)
X = pd.DataFrame(data, columns=boston.feature_names)
rf = RandomForestRegressor(random_state=RANDOM_SEED)
# I want to transform my target in log
clf = TransformedTargetRegressor(regressor=rf,
func=np.log1p,
inverse_func=np.expm1)
sfs = SFS(clf,
k_features=(1, X.shape[1]),
forward=False,
floating=True,
scoring='neg_mean_absolute_error',
verbose=1,
n_jobs=-1,
cv=3)
pipe_clf = Pipeline(steps=[('sfs', sfs),
('clf', clf)])
param_grid = [{
'sfs__estimator__regressor__max_depth' : [1, 3]
}]
GDCV = GridSearchCV(estimator=pipe_clf, param_grid=param_grid, cv=3,
n_jobs=-1, scoring='neg_mean_absolute_error',
return_train_score=True,
verbose=True, refit=True)
# GDCV.fit(X, y) Fail but not my main problem here
GDCV.fit(X.values, y) # OK
cv_result = pd.DataFrame(GDCV.cv_results_)
cv_result.sort_values('rank_test_score')max_depth params don't change mean_train_score and mean_test_score and don't have best k_features with sfs.
What I am missing ?
Metadata
Metadata
Assignees
Labels
No labels
