[ENH] Add LogLoss to TransformedTargetRegressor (TTR) #605

joshdunnlime · 2025-09-26T21:12:08Z

Reference Issues/PRs

#601
Problem Summary: LogLoss not included due to lack of a log_pdf. It can be computed but requires calculating the Jacobian of the transform. For many transformers this is non-trivial.

What does this implement/fix? Explain your changes.

I have implemented several pieces here. They breakdown as follows:

Add log_pdf method to TTR.
Add _jacobian method to TransformedDistribution (TD).
Replace transform method/function with transformer class in the TD.
Add ordered_gradient function to handle gradient(f, x) where the arrays are not sorted by x.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

Correctness of numerical calculations in log_pdf in TTR.
Edge-case of the Jacobian on TD.
Does replacing transform with transformer have any unintended consequences or cause a considerable breaking change?

Did you add any tests for the change?

I haven't yet added tests. I have modified the get_test_params to take a transformer instead of a single function.

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

…ion. This allows us to get more information from the tranformer class. E.g. MinMaxScaler scale_ parameter. Though not part of the typical forward-facing interface, TransformedDistribution.transform is now no longer callable. Instead TransformedDistribution.transform.inverse_transform would be needed.

This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.

Does so by adding the log_pdf with change-of-variables (jacobian of inverse transform) and direct method for linear transforms, and numerical method for non-linear transforms. TransformedDistribution now take transformer instead of transform - that is, the transformer class instead of just the inverse transform function/method.

This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.

review-notebook-app · 2025-09-26T21:12:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

This reverts commit 4146a6f, reversing changes made to 8bc5897.

fkiraly · 2025-09-26T22:47:01Z

skpro/distributions/trafo/_transformed.py

        self,
        distribution,
-        transform,
+        transformer,


sorry, we cannot change the names of already released arguments, or it might break user code. Not without deprecation and warnings ate least.

Understood. I'll make some changes.

fkiraly · 2025-09-26T22:49:19Z

skpro/distributions/trafo/_transformed.py

            )

-        trafo = self.transform
+        trafo = self.transformer.inverse_transform


why are we changing this?

The self.transform is just the inverse function at present, not the whole tranformer class. The idea was to expose the whole class so we can do numerical differentiation or access the scale_ attr. Based on the comment above this isn't favourable.

I think not easily possible, because the __init__ resets every estimator that is a component to a pre-fitted state on construction, and because of deprecation.

Overall, I think it is a very good idea, though one may have to find the right implementation.

Would passing inverse_transform as an optional additional argument work?

Alternatively, we have to pass something like sklearn FrozenEstimator wrapping the transformer, though it feels like a bit of an overhead software engineering wise.
The general question is to whether, and if yes how, to allow, API-wise, fitted estimators to be passed to __init__ - currently this violates the scikit-learn-like design.

fkiraly

Great idea! Though we cannot merge this as is, since:

this changes the arguments of existing released classes, possibly breaking user code
I think it introduces too much coupling between TTR and the TransformedDistribution, we should avoid that.

Suggestion: how about we allow the user to pass both transform and inverse_transform (optional) to the TransformedDistribution? Then we can do numerical differentiation.

… transformer

fkiraly

may I request:

remove the BaseTransformer from this pull request and open a separate one, since we will have some API discussions; let's aim to complete this with numerical differentiation, and later extend it with exact jacobians in a separate PR
may I suggest to move the numerical differentiation to TD for it to happen only inside _pdf and _log_pdf? And leave TTR untouched? This we could merge quickly, and it could be "version 0.1" of the feature you want.e
- testing it might also highlight aspects we have not yet thought about

joshdunnlime · 2025-10-03T21:08:33Z

Apologies, I made the DiffTransformer changes on the wrong branch.

fkiraly · 2025-10-03T21:15:10Z

No problem.

May I suggest to change this PR to revert changes across the classes, and add only an approximate log_pdf and pdf to the TD?

That would be a good starting point for adding more features in separate PR (e.g., autodiff-like).

skpro/distributions/trafo/_transformed.py

fkiraly

Thanks - almost ready to merge. The public methods should not be overridden, private methods should be implemented.

See how, for instance, _cdf is implemented.

skpro/distributions/trafo/_transformed.py

fkiraly · 2025-10-04T12:36:30Z

skpro/distributions/trafo/_transformed.py

+        else:
+            raise NotImplementedError
+
+        jac = np.abs(self.transformer_.inverse_transform_diff(x))


this we should replace with an approximate jacobian

This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.

joshdunnlime · 2025-10-05T10:25:23Z

I have implemented the _pdf and _log_pdf so that they give approximations (e.g. not including the Jacobian). We could try to approximate the Jacobian but I think it is better to compute it directly (as per our discussions in the this PR and the other PR).

This PR could be merged but it does have to potential to return very approximate pdf and log_pdf results. My suggestion would instead be to make the DifferentiableTransformer or DifferentiableFunction changes on a branch from this (branch stacking) and merge it all at once.

Notes for reviewer/@fkiraly :

I have added some warnings to inform the user that these are approximations. I have added this to the docs as well. This can be removed when we implement the Jacobian.

When changing pdf and log_pdf to the private functions I ran into some issues. The first was that the internal (non-transformed) distribution in TTR didn't have the same index and columns as TTR. This caused inf to be returned. Setting the index and columns during TTR training fixed this. This explains the changes to ttr.

The seconds issue that was rather subtle was "distr:measuretype": "discrete" was causing the _pdf and _log_pdf calls to be ignored. I have changed this to "mixed". I assume this is correct (given the three choices). I'm not sure if the API allows this but perhaps setting the TTR "distr:measuretype" based on the distribution it is wrapping would be beneficial.

Finally, to pass the "pdf_log_pdf" tests, these had to be added to the approx tags.

fkiraly · 2025-10-05T14:47:46Z

I have added some warnings to inform the user that these are approximations.

Yes, good thinking - this is consistent with the default approximations in BaseDistribution, if any are called the user also receives warnings about those.

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Finally, to pass the "pdf_log_pdf" tests, these had to be added to the approx tags

Well, looks like the tests were doing their job.

fkiraly

Some questions which may imply changed requests:

why are we doing the branch "if hasattr(dist, "_distribution_attr")? This feels like a violation of the law of demeter. We should only use dist.pdf.
the wrong indices are explained by this, I think. You should not need to correct them if all we do is call pdf.dist.

Can you also explain to me how the computation is an approximation to transformed pdf? It is missing the jacobian.

joshdunnlime · 2025-10-05T15:47:18Z

Can you also explain to me how the computation is an approximation to transformed pdf? It is missing the jacobian.

I guess that depends on the definition of approximate - I might have been somewhat loose with it. Refer to my sentence from above:

This PR could be merged but it does have the potential to return very approximate pdf and log_pdf results. My suggestion would instead be to make the DifferentiableTransformer or DifferentiableFunction changes on a branch from this (branch stacking) and merge it all at once.

I recommend not merging this independently but waiting until the we have a solution for the exact Jacobian in place. My suggestion is to use branch stacking to implement the DiffT or DiffFunc classes. I have this done already and will open a new PR (in place of #608).

fkiraly · 2025-10-05T15:49:27Z

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Actually, if it was "discrete", the code in __init__ should set it to "discrete" via set_tags. Otherwise no setting needs to be done, it should be "mixed".

fkiraly · 2025-10-05T15:51:26Z

I guess that depends on the definition of approximate - I might have been somewhat loose with it.

I was under the impression you were going to implement numerical derivatives.

I have now made a draft here in the base class: #610

This is due to realizing that this is nothing specific to the transformed distribution - any distribution that does not have a pdf implemented but a cdf can produce a numerical pdf like this.

This should, in particular, fix it for the TransformedDistribution - do you want to give it a test?

joshdunnlime · 2025-10-05T16:17:06Z

why are we doing the branch "if hasattr(dist, "_distribution_attr")? This feels like a violation of the law of demeter. We should only use dist.pdf.

the wrong indices are explained by this, I think. You should not need to correct them if all we do is call pdf.dist.

Perhaps a misunderstanding on my behalf. This is now fixed.

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

joshdunnlime · 2025-10-05T16:21:21Z

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Actually, if it was "discrete", the code in __init__ should set it to "discrete" via set_tags. Otherwise no setting needs to be done, it should be "mixed".

It was discrete so I changed it explicitly to mixed. It seems to inherit mixed from the base class. Should I remove it?

fkiraly · 2025-10-05T16:25:02Z

It was discrete so I changed it explicitly to mixed. It seems to inherit mixed from the base class. Should I remove it?

I would say:

leave for class at "mixed"
in __init__, set to "discrete" if and only the inner distribution is "discrete".

fkiraly · 2025-10-05T16:25:52Z

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

What are the indices currently? I think they should be the same between row and column index.

joshdunnlime · 2025-10-05T16:32:53Z

I guess that depends on the definition of approximate - I might have been somewhat loose with it.

OK - some confusion around approximate here:

Approx in

I was under the impression you were going to implement numerical derivatives.

Yes, but I am also adding explicit differentiation in two cases: 1) the user passes the explicit inverse_func_diff or 2) a scaler transform is used.

In these cases, the Jacobian is exact and therefore so is the log_pdf.

I have now made a draft here in the base class: #610

This is due to realizing that this is nothing specific to the transformed distribution - any distribution that does not have a pdf implemented but a cdf can produce a numerical pdf like this.

Understood, but the TD doesn't have a cdf so I'm not sure this helps with the LogLoss issue.

(edit) or TD is this implemented via the BaseDistribution cdf?

This should, in particular, fix it for the TransformedDistribution - do you want to give it a test?

Yes - happy to test. It will be interesting to compare this to the exact methods above.

fkiraly · 2025-10-05T16:38:21Z

Yes, but I am also adding explicit differentiation in two cases: 1) the user passes the explicit inverse_func_diff or 2) a scaler transform is used.

In these cases, the Jacobian is exact and therefore so is the log_pdf.

I know, but I think we need to add these gradually. Numerical differentiation is a sensible default in the absence of anything else.

Therefore imo it makes sense to put it even in the base class as a the last fallback.

Understood, but the TD doesn't have a cdf so I'm not sure this helps with the LogLoss issue.

Oh, I see. You would need the inverse transformation for that.

joshdunnlime · 2025-10-05T16:38:26Z

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

What are the indices currently? I think they should be the same between row and column index.

Prior to the change, always a pandas RangeIndex for the columns. I have named columns so it was raising an index error like "col not found in y.columns".

fkiraly · 2025-10-05T16:47:44Z

Prior to the change, always a pandas RangeIndex for the columns. I have named columns so it was raising an index error like "col not found in y.columns".

Is this a bug in the wrapped regressor then? It should return index and columns in line with the X and y of the predict.

fkiraly · 2025-10-05T16:56:46Z

I have now opened an additional PR #611, this adds an exact cdf for the TD using the transform of the distribution.

Combined with this, #610 should be able to produce a reasonable pdf through numerical differentiation.

The sequence of computations would be, when calling pdf:

the TTR passes the inner transforms fitted transform to the TD inverse_transform
an exact cdf is computed using the inverse_transform argument that is passed (the TTR always has this because the inverse of the inverse is simply transform)
finally, the base class default of pdf does numerical differentiation on cdf, calling it multiple times

(the user should see nothing of this, optimally)

joshdunnlime added 5 commits September 25, 2025 22:24

feat: allows n_trails=0 for fast optimisation free training

01fbf69

This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.

Merge branch 'xgboostlss-no-trails' into ttr-logloss

4146a6f

docs: Add notebook to show jacobian calcs

8612b11

This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.

joshdunnlime requested review from SaiRevanth25, felipeangelimvieira and fkiraly as code owners September 26, 2025 21:12

joshdunnlime added 2 commits September 26, 2025 22:44

test: fix TransformdDistribution test with np.exp as inverse_func

31fb943

Revert "Merge branch 'xgboostlss-no-trails' into ttr-logloss"

b6e067f

This reverts commit 4146a6f, reversing changes made to 8bc5897.

fkiraly reviewed Sep 26, 2025

View reviewed changes

fkiraly requested changes Sep 26, 2025

View reviewed changes

joshdunnlime added 3 commits September 28, 2025 23:16

docs: fix doctest example for TransformedDistribution

75561b0

feat: allows use of user defined tranform_diff function on an skpro…

c139348

… transformer

feat: add DifferentiableTransformer and base classes

17fe6ef

fkiraly mentioned this pull request Oct 1, 2025

[ENH] TransformedTargetRegressor should mesh with linear or smooth transformations to produce exact pdf and log_pdf #601

Open

fkiraly requested changes Oct 1, 2025

View reviewed changes

feat: Implement more of the sklearn -> skpro transformer logic

2f64914

joshdunnlime mentioned this pull request Oct 3, 2025

Base and Differentiable Transformer #608

Closed

5 tasks

joshdunnlime added 3 commits October 4, 2025 00:21

add pdf and keep log_pdf

1ff060a

remove sample notebook

435dae7

add pdf

f0ae44c

fkiraly reviewed Oct 4, 2025

View reviewed changes

skpro/distributions/trafo/_transformed.py Outdated Show resolved Hide resolved

fkiraly requested changes Oct 4, 2025

View reviewed changes

fkiraly reviewed Oct 4, 2025

View reviewed changes

skpro/distributions/trafo/_transformed.py Outdated Show resolved Hide resolved

fkiraly reviewed Oct 4, 2025

View reviewed changes

joshdunnlime added 5 commits October 4, 2025 16:59

feat: add approximation of the pdf and log_pdf functions

939a191

feat: add the _pdf and _log_pdf methods with index and column info

4358e25

fix: change TranformedDistribution measuretype to mixed

ce5151d

feat: add index and column information to internal distribution

9e04e80

This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.

fix: add pdf and log_pdf to approx to pass pdf_log_pdf test

c3980d1

fkiraly requested changes Oct 5, 2025

View reviewed changes

use the distribution pdf and log_pdf method (remove scipy workaround)

9ea1db2

joshdunnlime mentioned this pull request Oct 5, 2025

[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

Open

5 tasks

[ENH] Add LogLoss to TransformedTargetRegressor (TTR) #605

Are you sure you want to change the base?

[ENH] Add LogLoss to TransformedTargetRegressor (TTR) #605

Uh oh!

Conversation

joshdunnlime commented Sep 26, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

Uh oh!

review-notebook-app bot commented Sep 26, 2025

Uh oh!

fkiraly Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

joshdunnlime Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

joshdunnlime Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 28, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

joshdunnlime commented Oct 3, 2025

Uh oh!

fkiraly commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fkiraly Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

joshdunnlime commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 5, 2025

Uh oh!

joshdunnlime commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 5, 2025

joshdunnlime Sep 28, 2025 •

edited

Loading

fkiraly commented Oct 3, 2025 •

edited

Loading

joshdunnlime commented Oct 5, 2025 •

edited

Loading

fkiraly commented Oct 5, 2025 •

edited

Loading

joshdunnlime commented Oct 5, 2025 •

edited

Loading

fkiraly commented Oct 5, 2025 •

edited

Loading