Support dual_gap_ on ElasticNet & Lasso#6714
Support dual_gap_ on ElasticNet & Lasso#6714jcrist wants to merge 3 commits intorapidsai:branch-25.06from
dual_gap_ on ElasticNet & Lasso#6714Conversation
csadorf
left a comment
There was a problem hiding this comment.
LGTM to me overall, but it seems like CI is picking up some real regressions.
Also, can you create an issue for the necessary follow-up work to address the extra work for dual gap computation?
This is to improve sklearn compatibility. Both `ElasticNet` and `Lasso` in sklearn compute the dual gap as part of the coordinate descent solver, and store it as `dual_gap_` on the fit model. For now, we compute the `dual_gap_` from the final fit state and store it the same. Two notes: - Our CD solver doesn't use the dual gap as part of its stopping criteria, while sklearn's does. This means that we in practice stop fitting earlier in the optimization, resulting in a larger dual gap for the same tolerance. We could (and maybe should) update our solver to better match that of sklearn's. - Our CD solver computes much of what's needed for calculating the dual gap as part of the fit. Doing this with cupy after the fit repeats work. However, for problems of a meaningful size computing the dual gap afterwards is negligible compared to the cost of the fit. I think doing the easier thing and leaving things in python makes sense for now.
|
I believe I've fixed the regressions. I was trying to benchmark to see the effect this would have, but ran into errors on larger problems (see #6736). A few general observations:
Anyway, here's the benchmark: from cuml.datasets import make_regression
from cuml.linear_model import ElasticNet
from time import perf_counter
def bench(N, M, n_runs=5):
model = ElasticNet()
X, y = make_regression(n_samples=N, n_features=M, random_state=42)
times = []
for _ in range(n_runs):
start = perf_counter()
model.fit(X, y)
stop = perf_counter()
times.append(stop - start)
duration = sum(times) / n_runs
print(f"shape = ({N}, {M}): {duration:.2} s")
for N in [100_000, 1_000_000]:
for M in [500, 2000]:
bench(N, M)And the results: Right now I feel like merging this still makes sense, but we might want to prioritize a followup later to fix the bug and move computation into the solver itself. |
|
Dask tests are failing due to #6737. Looks like there's 2 xpassing tests in the sklearn test suite. If we think this is still worth merging, I'll fix and push again. |
csadorf
left a comment
There was a problem hiding this comment.
Looking at the benchmark results, I think I will have to have another look at this, considering that we might introduce a significant performance regression.
|
Closing this since we'll want to handle this in the solver itself. I've opened #6759 to track that. |
This is to improve sklearn compatibility. Both
ElasticNetandLassoin sklearn compute the dual gap as part of the coordinate descent solver, and store it asdual_gap_on the fit model.For now, we compute the
dual_gap_from the final fit state and store it the same. Two notes:Fixes #6467.