Skip to content

Don't call to_output on a cupy array#7044

Merged
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.10from
Matt711:bug/cupy-to-output
Sep 18, 2025
Merged

Don't call to_output on a cupy array#7044
rapids-bot[bot] merged 5 commits intorapidsai:branch-25.10from
Matt711:bug/cupy-to-output

Conversation

@Matt711
Copy link
Copy Markdown
Contributor

@Matt711 Matt711 commented Jul 25, 2025

@Matt711 Matt711 requested a review from a team as a code owner July 25, 2025 13:42
@Matt711 Matt711 requested review from teju85 and vyasr July 25, 2025 13:42
@github-actions github-actions Bot added the Cython / Python Cython or Python issue label Jul 25, 2025

def _transform_one(transformer, X, y, weight, **fit_params):
res = transformer.transform(X).to_output('cupy')
res = transformer.transform(X)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the right fix, where should I put the appropriate unit test?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for ColumnTransformer are in cuml/tests/test_compose.py, that's where I'd add a test for this.

@Matt711 Matt711 added bug Something isn't working non-breaking Non-breaking change labels Jul 25, 2025
Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think we can do a simpler fix here (as documented below), but we definitely should do some follow-up work to make cuml.preprocessing estimators better follow our type reflecting conventions.

res = transformer.transform(X).to_output('cupy')
res = transformer.transform(X)

if isinstance(res, cpu_np.ndarray):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues here:

  • OneHotEncoder doesn't follow our type reflection conventions (I think much ofcuml.preprocessing also has this issue).
  • ColumnTransformer needs to be more robust to consuming disparate output types from transformers (if nothing else, user defined transformers may return a variety of types).

I think the best immediate fix for this issue is something like:

with cuml.using_output_type("cupy"):
    return transformer.transform(X)

This should ensure that any properly implemented cuml transformer returns a cupy array (and not a CumlArray or something else). This is already done for _fit_transform_one below, but not for _transform_one here.

The consumer of this output then handles possibly sparse outputs so we don't want to error here if the return type is somethign else. So the branching structure here in _transform_one isn't needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you mind filing an issue for the follow-up work?


def _transform_one(transformer, X, y, weight, **fit_params):
res = transformer.transform(X).to_output('cupy')
res = transformer.transform(X)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests for ColumnTransformer are in cuml/tests/test_compose.py, that's where I'd add a test for this.

@yinchi
Copy link
Copy Markdown

yinchi commented Aug 15, 2025

Hi, original submitter of #7039 here.

Given that the last two commits were simply merging from branch-25.10 and the failure from the last attempt was a totally unrelated timeout issue, is it possible to try again?

Copy link
Copy Markdown
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for dropping this earlier. I've pushed a small test improvement, this LGTM!

@jcrist
Copy link
Copy Markdown
Member

jcrist commented Sep 16, 2025

/merge

@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Sep 17, 2025

Blocked by #7230 .

@rapids-bot rapids-bot Bot merged commit 7e63101 into rapidsai:branch-25.10 Sep 18, 2025
101 checks passed
@csadorf
Copy link
Copy Markdown
Contributor

csadorf commented Sep 18, 2025

@Matt711 I am very sorry that it took us so long to be able to accept and merge your contribution. Thank you very much for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] AttributeError when combining OneHotEncoder and ColumnTransformer

5 participants