Accelerate linear model predict on C-ordered inputs#7329
Merged
rapids-bot[bot] merged 1 commit intorapidsai:branch-25.12from Oct 13, 2025
Merged
Accelerate linear model predict on C-ordered inputs#7329rapids-bot[bot] merged 1 commit intorapidsai:branch-25.12from
rapids-bot[bot] merged 1 commit intorapidsai:branch-25.12from
Conversation
This started out as a cleanup PR, but moved to a performance improvement after some benchmarking. `LinearRegression`, `ElasticNet`, `Lasso`, and `Ridge` all share the same `predict` method. This calculates `X.dot(coef.T) + intercept`. Previously we used a function from `libcuml` to compute the single target case, and `cupy` to handle the multitarget case. After some benchmarking, I no longer think using `libcuml` at all here is worth it. It's simpler to always take the `cupy` path, and `cupy` already handles dispatching to cublas appropriately to handle disparate layouts (C vs F). For F-ordered inputs we see roughly the same performance as before. For C-ordered inputs, we see anything from mild speedups (150 us now, vs 200 us before) on small data, to up to 10x speedup on larger data (0.75 ms now vs 8.4 ms before). Presumably this is due to avoiding unnecessary copies to force a uniform F order as we did before.
Member
Author
|
Since the bench.pyfrom itertools import product
from time import perf_counter
import cupy as cp
from cuml import LinearRegression
from cuml.datasets import make_regression
N_FEATURES = [100, 1000]
N_SAMPLES = [1000, 10_000, 100_000]
DTYPES = ["float32", "float64"]
ORDERS = ["C", "F"]
N_RUNS = 5
for order, dtype in product(ORDERS, DTYPES):
print(f"{order = }, {dtype = }")
for n_features, n_samples in product(N_FEATURES, N_SAMPLES):
X, y = make_regression(
n_samples,
n_features,
dtype="float" if dtype == "float32" else "double",
random_state=42,
)
model = LinearRegression().fit(X, y)
X = cp.asarray(X, order=order)
# Warmup
for _ in range(N_RUNS):
model.predict(X)
start = perf_counter()
for _ in range(N_RUNS):
model.predict(X)
duration = (perf_counter() - start) / N_RUNS
print(f"- {X.shape}: {duration * 1e3:.3f} ms")**Before this PR****After this PR** |
Merged
viclafargue
approved these changes
Oct 13, 2025
Contributor
|
I've run slightly more extended benchmarks with larger data and don't observe any significant regressions on F-ordered data and major speed-up on C-ordered data. I think this is a good improvement. |
csadorf
approved these changes
Oct 13, 2025
Member
Author
|
/merge |
rapids-bot Bot
pushed a commit
that referenced
this pull request
Oct 13, 2025
- Release GIL - Simple `__init__` following sklearn conventions - Only warn on single input if solver set explicitly, otherwise if `auto` and only 1 column default to `svd` automatically without warning. - General readability cleanups On top of #7329 (relies on some changes there). Part of #7317. Authors: - Jim Crist-Harif (https://github.com/jcrist) Approvers: - Simon Adorf (https://github.com/csadorf) - Victor Lafargue (https://github.com/viclafargue) URL: #7330
44 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This started out as a cleanup PR, but moved to a performance improvement after some benchmarking.
LinearRegression,ElasticNet,Lasso, andRidgeall share the samepredictmethod. This calculatesX.dot(coef.T) + intercept.Previously we used a function from
libcumlto compute the single target case, andcupyto handle the multitarget case.After some benchmarking, I no longer think using
libcumlat all here is worth it. It's simpler to always take thecupypath, andcupyalready handles dispatching to cublas appropriately to handle disparate layouts (C vs F).For F-ordered inputs we see roughly the same performance as before.
For C-ordered inputs, we see anything from mild speedups (150 us now, vs 200 us before) on small data, to up to 10x speedup on larger data (0.75 ms now vs 8.4 ms before). Presumably this is due to avoiding unnecessary copies to force a uniform F order as we did before.