The estimators in cuml.linear_model have a normalize option that defaults to False. If set to True, the data input to fit will be "normalized" as if a StandardScaler was applied to it before fitting.
This differs from the standard sklearn behavior, where none of these models have a normalize option. Some of them used to, but it had different behavior than what we do in cuml.
Additionally, it's unclear how this should be used in practice, since to apply predictions you'd need to save the values used for normalization (the column standard deviations) so you can reuse them at predict time. This is why sklearn opted to separate normalization from the model itself, so you can chain a StandardScaler and an ElasticNet in a Pipeline and have everything work properly and follow best practices.
Further, some of our solvers only support the normalize option in certain configurations, in others we just error saying "not supported".
I propose we drop the normalize option entirely. It can be better handled by an external StandardScaler when needed, reduces the complexity of our solvers, and lets us delete some code.
The estimators in
cuml.linear_modelhave anormalizeoption that defaults toFalse. If set toTrue, the data input tofitwill be "normalized" as if aStandardScalerwas applied to it before fitting.This differs from the standard sklearn behavior, where none of these models have a
normalizeoption. Some of them used to, but it had different behavior than what we do in cuml.Additionally, it's unclear how this should be used in practice, since to apply predictions you'd need to save the values used for normalization (the column standard deviations) so you can reuse them at predict time. This is why sklearn opted to separate normalization from the model itself, so you can chain a
StandardScalerand anElasticNetin aPipelineand have everything work properly and follow best practices.Further, some of our solvers only support the
normalizeoption in certain configurations, in others we just error saying "not supported".I propose we drop the normalize option entirely. It can be better handled by an external
StandardScalerwhen needed, reduces the complexity of our solvers, and lets us delete some code.