Skip to content

GridSearchCV & SequentialFeatureSelector to find best params & features #511

@armgilles

Description

@armgilles

Hey @rasbt

I'm strangling to find the best features & tuning using SequentialFeatureSelector and GridSearchCV.

I would like to test for each of my param_grid the best combinaison of features.

Code to reproduce :

from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np

RANDOM_SEED = 42

boston = load_boston()
data, y = load_boston(return_X_y=True)

X = pd.DataFrame(data, columns=boston.feature_names)
rf = RandomForestRegressor(random_state=RANDOM_SEED)

# I want to transform my target in log
clf = TransformedTargetRegressor(regressor=rf,
                                 func=np.log1p,
                                 inverse_func=np.expm1)

sfs = SFS(clf, 
          k_features=(1, X.shape[1]),
          forward=False, 
          floating=True, 
          scoring='neg_mean_absolute_error',
          verbose=1,
          n_jobs=-1,
          cv=3)

pipe_clf = Pipeline(steps=[('sfs', sfs),
                           ('clf', clf)])

param_grid = [{
    'sfs__estimator__regressor__max_depth' : [1, 3]
}]

GDCV = GridSearchCV(estimator=pipe_clf, param_grid=param_grid, cv=3,
                    n_jobs=-1, scoring='neg_mean_absolute_error',
                    return_train_score=True,
                    verbose=True, refit=True)

# GDCV.fit(X, y)  Fail but not my main problem here
GDCV.fit(X.values, y) # OK

cv_result = pd.DataFrame(GDCV.cv_results_)
cv_result.sort_values('rank_test_score')

image

max_depth params don't change mean_train_score and mean_test_score and don't have best k_features with sfs.

What I am missing ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions