&joshdunnlime [ENH] `TransformedDistribution` and `TransformedTargetRegressor` `cdf` support #611

fkiraly · 2025-10-05T16:49:56Z

This PR adds cdf support to TransformedTargetRegressor, via changes in TransformedDistribution:

TransformedDistribution now accepts inverse_transform, which can be used for an exact cdf
TransformedTargetRegressor passes the fitted self.transformer_transform to the inverse_transform of TransformedDistribution

Together with #610, it means that TransformedTargetRegressor can now produce distributions with reasonably reliable cdf and pdf.

Goes partially towards #601.

fkiraly · 2025-10-05T16:50:16Z

FYI @joshdunnlime, this is what I meant for a quick fix

This reverts commit 36e20a3.

fkiraly · 2025-10-05T21:24:56Z

@joshdunnlime, any idea why the test is failing? cdf and ppf do not seem to be inverse to each other.

joshdunnlime · 2025-10-06T07:43:00Z

@fkiraly is cdf meant to be approximate? The class tags are approx for cdf but the get_tags show cdf is exact. I can see this get set if there is an inverse passed. However, I get:

<class 'skpro.distributions.trafo._transformed.TransformedDistribution'> does not have a numerically exact implementation of the 'cdf' method, it is filled in by approximating the expected value by the indicator function on 1000 samples.

So the _cdf seems to be falling back to the approx (I get that warning when running in a script). I had similar issue with the _pdf and _log_pdf and it was an indices issue (set index and column names on the transformed). See here and here.

joshdunnlime · 2025-10-06T07:54:27Z

@fkiraly - What are the methods loc/iloc doing? It's being called by the _pdf via boilerplate but the class is being initialed again but without passing the inverse_transform.

joshdunnlime · 2025-10-06T08:08:26Z

@fkiraly merge https://github.com/joshdunnlime/skpro/tree/ttr-cdf-pdf-add-inv or just add

        inverse_transform=self.inverse_transform,

to the cls call in _iloc.

joshdunnlime · 2025-10-06T08:17:50Z

This still raises a X does not have valid feature names, but MinMaxScaler was fitted with feature names so I think there is still the indices issue mentioned above, but the tests pass.

fkiraly · 2025-10-06T11:02:21Z

@fkiraly - What are the methods loc/iloc doing?

These are methods to subset or reorder indices of the array distribution. This is called in loc and iloc which works like for pandas.DataFrame, the inner _loc / _iloc must define equivalent operations for distributions etc.

If the distribution is parametric, this is taken care of by the default, but in general, e.g., if the distribution is composite, it needs to be done manually currently.

There is space to define a broader default including distribution objects as components, but that is currently an open issue:
#559

It's being called by the _pdf via boilerplate but the class is being initialed again but without passing the inverse_transform.

I see, that must be it!

fkiraly · 2025-10-06T11:09:58Z

This still raises a X does not have valid feature names, but MinMaxScaler was fitted with feature names so I think there is still the indices issue mentioned above, but the tests pass.

can you outline your understanding of why this occurs? Which two objects exactly where, when passed to MinMaxScaler have missing or inconsistent feature names?

fkiraly · 2025-10-06T11:11:17Z

skpro/distributions/trafo/_transformed.py

-            assume_monotonic=self.assume_monotonic,
            index=new_index,
            columns=new_columns,
+            **params_dict,


I think this is safer and more extensible. This might already be an almost solution for #559 (only needs to be combined with checks for type?)

fkiraly · 2025-10-06T11:25:35Z

This still raises a X does not have valid feature names, but MinMaxScaler was fitted with feature names so I think there is still the indices issue mentioned above, but the tests pass.

can you outline your understanding of why this occurs? Which two objects exactly where, when passed to MinMaxScaler have missing or inconsistent feature names?

If you have a fix, @joshdunnlime, could you open a PR with only the fix? So we can merge it quickly while the more complicated design questions remain open?

joshdunnlime · 2025-10-06T12:12:40Z

Done. It is literally that one line to allow the exact cdf.

joshdunnlime · 2025-10-06T12:23:40Z

skpro/distributions/trafo/_transformed.py

+
+        inv_trafo = self.inverse_transform
+
+        inv_x = inv_trafo(x)


This is where MinMaxScaler in my local script raised the warning.

I used:

warnings.filterwarnings('error')

to catch and debug.

I see.

I think this is resolved by ensuring x is a pd.DataFrame when we pass it.

joshdunnlime · 2025-10-06T12:31:46Z

This still raises a X does not have valid feature names, but MinMaxScaler was fitted with feature names so I think there is still the indices issue mentioned above, but the tests pass.

can you outline your understanding of why this occurs? Which two objects exactly where, when passed to MinMaxScaler have missing or inconsistent feature names?

I have added a comment on where the warning is raised. The inverse_tranform function returns a numpy array. This was easy to find when raising warning as errors (see comment).

I think the solution I have implemented in #612, where every function or transformer is wrapped in a DifferentiableTransformer would be the nicest way to fix this. It is then very easy to keep all of the transformation logic in there and out of the TransformedDistriubtion. We can guarantee that tranform, inverse_transform and their _diff functions always return a dataframe. This means much less code like:

if not isinstance(x_t, pd.DataFrame):
    x_t = pd.DataFrame(x_t, index=x.index, columns=x.columns)
else:
    x_t.columns = x.columns
    x_t.index = x.index

This is the case for _pdf, _log_pdf, _cdf and probably all the other functions that need transformer outputs.

fkiraly · 2025-10-06T19:03:35Z

We can guarantee that tranform, inverse_transform and their _diff functions always return a dataframe.

Can you explain why this would be the case? The data arrives inside _cdf and _pdf as numpy already.

This may have been an unfortunate design choice (not sure about this yet), but that also means whatever we do with x or p, it starts with numpy. So, the DifferentiableTransformer may not get the pd.DataFrame it expects?

fkiraly · 2025-10-06T19:31:41Z

I have opened a new design issue here: #615
I think there are some interesting questions about nested/composite distributions where I would appreciate your opinion! Given that you have dug into the problems above.

I will now merge this PR to expedite the various improvements.

…rm` (#614) #### Reference Issues/PRs #611 #### What does this implement/fix? Explain your changes. Fixes the call to exact instead of approx for the `_cdf` method.

joshdunnlime · 2025-10-06T20:30:41Z

We can guarantee that tranform, inverse_transform and their _diff functions always return a dataframe.

Can you explain why this would be the case? The data arrives inside _cdf and _pdf as numpy already.

This may have been an unfortunate design choice (not sure about this yet), but that also means whatever we do with x or p, it starts with numpy. So, the DifferentiableTransformer may not get the pd.DataFrame it expects?

I'm talking more about the output of the cdf, ppf and pdf. We are nearly always applying some transform in the TD so having the outputs (e.g. Jacobian) as dataframe would tidy up the TD code. It does deviate from default sklearn behaviour but sklearn does have a transformer to dataframe setting.

Skpro TD method outputs are dataframes so this keeps consistency with that and makes debugging easier IMO.

I have opened a new design issue here: #615 I think there are some interesting questions about nested/composite distributions where I would appreciate your opinion! Given that you have dug into the problems above.

I will now merge this PR to expedite the various improvements.

I'll take a look. Thanks.

ttr cdf

2c4ff01

fkiraly requested review from SaiRevanth25 and felipeangelimvieira as code owners October 5, 2025 16:49

fkiraly added enhancement module:probability&simulation probability distributions and simulators module:regression probabilistic regression module labels Oct 5, 2025

fkiraly added 2 commits October 5, 2025 18:52

Update _transformed.py

098c185

Update _transformed.py

b610aca

fkiraly mentioned this pull request Oct 5, 2025

[ENH] Add LogLoss to TransformedTargetRegressor (TTR) #605

Open

5 tasks

fkiraly added 11 commits October 5, 2025 19:00

Update _transformed.py

e9a8f78

tests

220966c

Update _transformed.py

2651f7d

Update _transformed.py

6665b3e

Update _transformed.py

32228d0

Update _transformed.py

36e20a3

Update _transformed.py

429c70b

Revert "Update _transformed.py"

1c231ee

This reverts commit 36e20a3.

Update _transformed.py

c4846a1

Update _transformed.py

f5a3056

Update _transformed.py

c040f5e

Update _transformed.py

0c89b15

fkiraly changed the title ~~[ENH] TransformedDistribution and TransformedTargetRegressor cdf support~~ &joshdunnlime [ENH] TransformedDistribution and TransformedTargetRegressor cdf support Oct 6, 2025

fkiraly commented Oct 6, 2025

View reviewed changes

joshdunnlime mentioned this pull request Oct 6, 2025

[BUG] fix TransformedDistribution._iloc by passing inverse_transform #614

Merged

joshdunnlime reviewed Oct 6, 2025

View reviewed changes

Update _transformed.py

dfeb44c

Update _transformed.py

dd1320f

fkiraly merged commit 1c54087 into main Oct 6, 2025
34 checks passed

&joshdunnlime [ENH] TransformedDistribution and TransformedTargetRegressor cdf support #611

&joshdunnlime [ENH] TransformedDistribution and TransformedTargetRegressor cdf support #611

Uh oh!

Conversation

fkiraly commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

joshdunnlime commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshdunnlime commented Oct 6, 2025

Uh oh!

joshdunnlime commented Oct 6, 2025

Uh oh!

joshdunnlime commented Oct 6, 2025

Uh oh!

fkiraly commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 6, 2025

Uh oh!

fkiraly Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Oct 6, 2025

Uh oh!

joshdunnlime commented Oct 6, 2025

Uh oh!

joshdunnlime Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

joshdunnlime commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 6, 2025

Uh oh!

fkiraly commented Oct 6, 2025

Uh oh!

Uh oh!

joshdunnlime commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

&joshdunnlime [ENH] `TransformedDistribution` and `TransformedTargetRegressor` `cdf` support #611

&joshdunnlime [ENH] `TransformedDistribution` and `TransformedTargetRegressor` `cdf` support #611

fkiraly commented Oct 5, 2025 •

edited

Loading

joshdunnlime commented Oct 6, 2025 •

edited

Loading

fkiraly commented Oct 6, 2025 •

edited

Loading

joshdunnlime commented Oct 6, 2025 •

edited

Loading