Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
2db2683
DOC Rework LinearRegression documentation
virchan Sep 14, 2025
50c19fe
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Sep 14, 2025
33bda50
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Sep 17, 2025
4636bbc
Fix whitespace with pre-commit
virchan Sep 17, 2025
9d6023a
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Sep 22, 2025
7fe082f
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Sep 22, 2025
1a14d73
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Sep 29, 2025
4e311e1
Merge branch 'branch-25.10' into issues/1447/linear_regression
virchan Oct 2, 2025
8f48cd7
Merge branch 'branch-25.12' into issues/1447/linear_regression
virchan Oct 2, 2025
20b3b53
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 5, 2025
7ca93fa
Merge remote-tracking branch 'origin/main' into issues/1447/linear_re…
csadorf Nov 14, 2025
6061ecf
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 18, 2025
6d2c52e
fix upstream conflicts
virchan Nov 18, 2025
dd6f46c
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 19, 2025
b1bad08
Add `__cuda_array_interface__` to `LinearRegression` docstring
virchan Nov 19, 2025
3bf5518
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 20, 2025
1c2f22b
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 26, 2025
c7c7d79
Merge branch 'main' into issues/1447/linear_regression
virchan Nov 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,8 +197,11 @@

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"cudf": ("https://docs.rapids.ai/api/cudf/stable/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"python": ("https://docs.python.org/3", None),
"scipy": ("https://docs.scipy.org/doc/scipy", None),
"sklearn": ("https://scikit-learn.org/stable/", None),
}

# Config numpydoc
Expand Down
56 changes: 35 additions & 21 deletions python/cuml/cuml/linear_model/linear_regression.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,11 @@ class LinearRegression(Base,
LinearRegression is a simple machine learning model where the response y is
modelled by a linear combination of the predictors in X.

cuML's LinearRegression expects either a :class:`cudf.DataFrame` or a
:class:`numpy.ndarray` (matrix), and offers two algorithms: Singular Value
Comment thread
virchan marked this conversation as resolved.
Outdated
Decomposition (SVD) and Eigndecomposition (Eig) to fit a linear model.
SVD is more numerically stable, but Eig (the default) is much faster.

Examples
--------

Expand Down Expand Up @@ -138,33 +143,42 @@ class LinearRegression(Base,
algorithm : {'auto', 'svd', 'eig', 'qr', 'svd-qr', 'svd-jacobi'}, (default = 'auto')
Choose an algorithm:

* 'auto' - 'eig', or 'svd' if y multi-target or X has only one column
* 'svd' - alias for svd-jacobi
* 'eig' - use an eigendecomposition of the covariance matrix
* 'qr' - use QR decomposition algorithm and solve `Rx = Q^T y`
* 'svd-qr' - compute SVD decomposition using QR algorithm
* 'svd-jacobi' - compute SVD decomposition using Jacobi iterations
* 'auto' - ``'eig'``, or ``'svd'`` if y multi-target or X has only one column
* ``'svd'`` - alias for svd-jacobi
* ``'eig'`` - use an eigendecomposition of the covariance matrix
* ``'qr'`` - use QR decomposition algorithm and solve `Rx = Q^T y`
* ``'svd-qr'`` - compute SVD decomposition using QR algorithm
* ``'svd-jacobi'`` - compute SVD decomposition using Jacobi iterations

Among these algorithms, only 'svd-jacobi' supports the case when the
Among these algorithms, only ``'svd-jacobi'`` supports the case when the
number of features is larger than the sample size; this algorithm
is force-selected automatically in such a case.

For the broad range of inputs, 'eig' and 'qr' are usually the fastest,
followed by 'svd-jacobi' and then 'svd-qr'. In theory, SVD-based
algorithms are more stable.
For the broad range of inputs, ``'eig'`` and ``'qr'`` are usually the fastest,
followed by ``'svd-jacobi'`` and then ``'svd-qr'``. In theory, `svd`-based
algorithms are more numerically stable.
fit_intercept : boolean (default = True)
If True, LinearRegression tries to correct for the global mean of y.
If False, the model expects that you have centered the data.
copy_X : bool, default=True
If True, cuml will copy X when needed to avoid mutating the input array.
If you're ok with X being overwritten, setting to False may avoid a copy,
reducing memory usage for certain algorithms.
copy_X : boolean, default=True
If True, it is guaranteed that a copy of X is created, leaving the
original X unchanged. However, if set to False, X may be modified
directly, which would reduce the memory usage of the estimator.

.. versionchanged:: 23.08
Starting from version 23.08, the new `copy_X` parameter defaults
to ``True``, ensuring a copy of X is created after passing it to
`fit()`, preventing any changes to the input, but with increased
memory usage. This represents a change in behavior from previous
versions. With `copy_X=False` a copy might still be created if
necessary.

normalize : boolean, default=False

.. deprecated:: 25.12
``normalize`` is deprecated and will be removed in 26.02. When
needed, please use a ``StandardScaler`` to normalize your data
before passing to ``fit``.
needed, please use a :class:`sklearn.preprocessing.StandardScaler`
to normalize your data before passing to ``fit``.

handle : cuml.Handle
Specifies the cuml.handle that holds internal CUDA state for
Expand Down Expand Up @@ -194,9 +208,9 @@ class LinearRegression(Base,
-----
LinearRegression suffers from multicollinearity (when columns are
correlated with each other), and variance explosions from outliers.
Consider using Ridge Regression to fix the multicollinearity problem, and
consider maybe first DBSCAN to remove the outliers, or statistical analysis
to filter possible outliers.
Consider using :class:`Ridge` to fix the multicollinearity problem, and
consider maybe first :class:`DBSCAN` to remove the outliers, or
statistical analysis to filter possible outliers.

**Applications of LinearRegression**

Expand All @@ -206,8 +220,8 @@ class LinearRegression(Base,
tasks. This model should be first tried if the machine learning problem
is a regression task (predicting a continuous variable).

For additional information, see `scikitlearn's OLS documentation
<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html>`__.
For additional information, see scikit-learn's documentation for
:class:`sklearn.linear_model.LinearRegression`.

For an additional example see `the OLS notebook
<https://github.com/rapidsai/cuml/blob/main/notebooks/linear_regression_demo.ipynb>`__.
Expand Down