-
Notifications
You must be signed in to change notification settings - Fork 121
Description
get_feature_names_out is an important component for interpreting scikit-learn Pipeline objects. A get_feature_names_out call on a Pipeline only works if it is implemented for all components in the pipeline, except the last step (i.e. the Model).
Scikit-learn recently implemented get_feature_names_out for all Transformers in their 1.1 release (Source).
I think it makes sense to also implement get_feature_names_out for all scikit-lego Transformers that are not models and are not TrainOnly. This leaves most objects in sklego.preprocessing.
-
sklego.preprocessing.ColumnCapper -
sklego.preprocessing.DictMapper -
sklego.preprocessing.IdentityTransformer -
sklego.preprocessing.IntervalEncoder -
sklego.preprocessing.OutlierRemover(TrainOnly) -
sklego.preprocessing.PandasTypeSelector -
sklego.preprocessing.ColumnSelector -
sklego.preprocessing.ColumnDropper -
sklego.preprocessing.PatsyTransformer -
sklego.preprocessing.OrthogonalTransformer -
sklego.preprocessing.InformationFilter -
sklego.preprocessing.RandomAdder(TrainOnly) -
sklego.preprocessing.RepeatingBasisFunction
Additionally, it should be tested if get_feature_names_out works correctly with a Pipeline that contains transformers inheriting from TrainOnlyTransformerMixin, like RandomAdder.
@koaning and I recently discussed implementing get_feature_names_out for sklego.meta and ended up implementing this method for EstimatorTransformer (PR #539). It does not look like objects in sklego.decomposition and sklego.mixture require an implementation of get_feature_names_out, because it seems they are mostly used as the last step in a pipeline or wrapped in an EstimatorTransformer.
Since this is such a systematic issue, we can consider adding some additional requirements for people contributing to sklego.preprocessing. That is, make sure to implement get_feature_names_out for any new preprocessor that is not a train-time only Transformer.