Skip to content

Deprecate and remove normalize option in linear models #7400

@jcrist

Description

@jcrist

The estimators in cuml.linear_model have a normalize option that defaults to False. If set to True, the data input to fit will be "normalized" as if a StandardScaler was applied to it before fitting.

This differs from the standard sklearn behavior, where none of these models have a normalize option. Some of them used to, but it had different behavior than what we do in cuml.

Additionally, it's unclear how this should be used in practice, since to apply predictions you'd need to save the values used for normalization (the column standard deviations) so you can reuse them at predict time. This is why sklearn opted to separate normalization from the model itself, so you can chain a StandardScaler and an ElasticNet in a Pipeline and have everything work properly and follow best practices.

Further, some of our solvers only support the normalize option in certain configurations, in others we just error saying "not supported".

I propose we drop the normalize option entirely. It can be better handled by an external StandardScaler when needed, reduces the complexity of our solvers, and lets us delete some code.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions