[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

joshdunnlime · 2025-10-05T17:18:24Z

Reference Issues/PRs

Background on why we want this: #601 and #605
Supersedes this PR: #608

What does this implement/fix? Explain your changes.

This implementation adds a DifferentiableTransformer (DT). This acts as a wrapped around an sklearn transformer and gives the user the option to use either (in order):

an explicit derivative via inverse_func_diff.
an explicit derivative for a scaler transformer via scaler_.
Numerical differentiation of the inverse_transform.

The derivative available for both the (forward) transform and inverse_transform.

The DT has a coerce classmethod that takes a sklearn transformer or a function and coerces it to a DT.

In addition to the above, it also makes changes to #605 to apply the Jacobian (derivative of the transform/inverse_tranform) to the pdf and log_pdf. In cases 1) and 2) above this returns the exact pdf and log_pdf.

This preserves the current functionality of being able to pass a function to TTR (TransformedTargetRegressor) and allows a user to extend this by passing an sklearn transformer or their own DT. The final point allows the user to configure their DT with either the explicit derivative, or to pass kwargs that can be used for numerical differentiation. In theory, it also allows a user to pass these kwargs as hyperparameters via a gridsearch/optimisation.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

Implementation of the DT is the most important part:
- Are we happy with the transformer inheriting from sklearn?
- Do we need this much abstraction at this stage?
- Do we wish to add/implement tags for this?
Changes to TD (TranformedDistribution) - the main change here is internally handling a transformer instead of a function.
Changes to the TTR are minor.

On points 2) and 3), note that if we wish to explicitly pass transform only as a method/function, it is very straightforward to change this back. We would simply need to add inverse_transform and inverse_func_diff as kwargs to the TD. This does mean adding more kwargs if we wish to support numerical derivative kwargs.

Secondary feedback

Did you add any tests for the change?

Yes, added a param3 to TD, passing a FunctionTransformer instead of just a function.

Any other comments?

I had the bulk of this implementation down prior to the discussion on what we pass to TTR/TD here. Again, it is trivial to move the creation of the DT into TTR and pass the transform, inverse_transform and inverse_func_diff, but I feel the current approach is much cleaner as it keep the majority of the new logic in the DT and within the new _transformer.py module.

PR checklist

For all contributions

I've added myself to the list of contributors with any new badges I've earned :-)
How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
See here for full badge reference
The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.

For new estimators

I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
dependency isolation, see the estimator dependencies guide.

…ion. This allows us to get more information from the tranformer class. E.g. MinMaxScaler scale_ parameter. Though not part of the typical forward-facing interface, TransformedDistribution.transform is now no longer callable. Instead TransformedDistribution.transform.inverse_transform would be needed.

This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.

Does so by adding the log_pdf with change-of-variables (jacobian of inverse transform) and direct method for linear transforms, and numerical method for non-linear transforms. TransformedDistribution now take transformer instead of transform - that is, the transformer class instead of just the inverse transform function/method.

This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.

This reverts commit 4146a6f, reversing changes made to 8bc5897.

… transformer

This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.

joshdunnlime · 2025-10-05T17:49:00Z

Usage examples:

Create some data:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

from skpro.metrics import LogLoss, CRPS
from skpro.regression.xgboostlss import XGBoostLSS
from skpro.regression.compose import TransformedTargetRegressor
from skpro.compose import DifferentiableTransformer

size = 1000

X = np.random.normal(1, 1, size)
y = (2 * X + np.random.normal(0, 0.5, size)) # / 15 + 0.5 # for Beta

Xy = pd.DataFrame(
    {"target": y, "feature": X},
    index=range(size)
)

Now we can create pass an sklearn transform to TTR:

xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)

or a DT:

xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
mms = DifferentiableTransformer(transformer=MinMaxScaler())
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)

Both are handled the same:

pipe.fit(X=Xy[["feature"]], y=Xy["target"])
pp = pipe.predict_proba(Xy[["feature"]])
p = pipe.predict(Xy[["feature"]])
CRPS()(y_true=Xy["target"], y_pred=pp)
LogLoss()(y_true=Xy["target"], y_pred=pp)

0.41678

And in either case, if we inspect the TD (pp) returned by predict_proba we get a DT as the transformer.

pp.transformer_

DifferentiableTransformer(transformer=MinMaxScaler(feature_range=(0.1, 0.9)))

fkiraly · 2025-10-07T17:05:50Z

I tried to resolve the conflicts, have a look

…unnlime/skpro into differentiable-transformer

Here we consider the numerical differentiation to classed as approx. For wellbehaved function (which change-of-variables requires), this is likely to be overkill.

joshdunnlime · 2025-10-13T08:46:55Z

@fkiraly

This should give working implementation.

There are a couple design choice that it might be worth discussing:

numdifftools was added to handle fast and accurate numerical differentiation.
- Added as an optional extra. We could add an "install numdifftools/extras" warning when the fallback is used so users know there is a better option.
- scipy derivative function has some breaking changes between versions. We could handle this by checking versions however, the scipy implementation is a simplified version of numdifftools so numdifftools is likely to give more robust results.
Do we need a DifferentiableTransformer (DT)? And does it need to inherit from BaseTransformer, BaseEstimator etc?
- We could have a DifferentiableFunction class you as mentioned previously. My main concern with it being a Transformer is that it makes design choices about the transformer base class which might narrow the design choices for that in the future.
- Does it need to inherit from sklearn's BaseEstimator or should it have a pure skpro/skbase implementation?
This implementation can pass a function, an sklearn transformer (skT) or DT to TTR (all via the transform kwarg). It doesn't support passing inverse_transform or inverse_tranform_diff but it have kept that in mind while designing this. It should be trivial to extend this (allow passing an SKT, DT or inverse_transform + inverse_transform_diff), or to constrain this to only allow inverse_transform + inverse_transform_diff.
- I am in favour of supporting skT or DT as it feels more sklearn-like. E.g. wrap an sklearn-like object in another sklearn-like object and then just fit-predict.
- Supporting all of the above options doesn't introduce any breaking changes.
Should we consider numerical differentiation as approx? The current implementation does, however, the "change-of-variables" only works with well-behaved functions. These are incredibly easy for numerical differentiation to handle, with errors typically being in the order of the 7th decimal place when comparing the exact derivative of a function to the numerical derivative for CRPS, LogLoss and MAE of the log_pdf.
Currently, multivariate targets are handled completely independently, that is to say, the two targets are transformed independently. Thus, the Jacobian is treated as a diagonal matrix with no partial derivatives. This is mathematically correct for all sklearn transformers where the transform (F_i(y)) is only dependent on it's respective target (y_i). Other custom transformers could be created, where some F_i(y) is dependent on any/all y, but I would consider this extremely unlikely.

In addition to the above, the documentation needs some tidying, and comments, tags and licenses need updating.

Further improvements:

Tidy up repeated handling of index and columns.
Tests - do we need specific tests added?
Tests don't catch false-negative exact for pdf_log_pdf. E.g. if the tag logic is changed and something that is exact get's tagged as "approx" this won't be caught.
The fallback derivative could also be improved.

fkiraly · 2025-10-14T15:26:16Z

This should give working implementation.

Oh, very nice! I will review but first leave some comments on the design choices.

numdifftools was added to handle fast and accurate numerical differentiation.

I think we should not take a dependency on numdifftools - the package looks abandoned. Last release in 2022, and sole maintainer. It implies python 3.10 or python 3.11, so will lapse into outdated python within 2 years.

I will contact the author to see what is going on, but for now I think we should avoid it.

Do we need a DifferentiableTransformer (DT)?

That is unclear to me and we should really think carefully about pros and cons.

We could have a DifferentiableFunction class you as mentioned previously.

But then we would need to turn the fitted transformer into a DifferentiableFunction, right? Feels less extensible to me. Or how would everything fit together? Would appreciate your thoughts.

Does it need to inherit from sklearn's BaseEstimator or should it have a pure skpro/skbase implementation?

If transformer, I am not in favour strongly of either option. If a function, I would favour an skbase implementation strongly, because it is too far from the sklearn transformer.

This implementation can pass a function

I like this, though I still need time to think about it too.

Should we consider numerical differentiation as approx?

Where exactly? I have already implemented numerical differentiation (with sixth-order approximation) in the pdf and log_pdf defaults.

Currently, multivariate targets are handled completely independently,

That is correct, currentls skpro cannot support multivariate distributions. That is a bit of a project, and the API is not fleshed out.

I think that API question needs to be answered before we move to the differentiation question (but it might make sense to look at both in close sequence ot validate the design).

E.g., do we need to have a separate method for multivariate pdf? Note that the current pdf produces marginals of an independent distribution across variables. But for joint distributions the marginals do not carry the full information, so there needs to be a way to return joint pdf.

fkiraly · 2025-10-14T15:43:20Z

FYI, I opened an issue to collect API discussion for multivariate distributions here:
#622

joshdunnlime · 2025-10-19T11:11:09Z

I think we should not take a dependency on numdifftools - the package looks abandoned. Last release in 2022, and sole maintainer. It implies python 3.10 or python 3.11, so will lapse into outdated python within 2 years.

I'll revert to scipy with version handling for misc.derivative and differentiation.derivative function or our own.

But then we would need to turn the fitted transformer into a DifferentiableFunction, right? Feels less extensible to me. Or how would everything fit together? Would appreciate your thoughts.

I've had much more of a think about having a DifferentiableFunction vs DifferentiableTransformer. I am confident the DiffT is the right way to go:

It follows existing sklearn-like patterns. DiffF introduces a new pattern.
Adding numerical differentiation is unrestrictive to future development of BaseTransformer. The BaseDiffT only requires inverse_diff and transform_diff as public methods and _fit_with_fitted_transformer as a private method and currently has _numerical_diff but that is not a hard requirement.

My impression is that if we were implementing TTR and TD from scratch, with LogLoss, pdf and log_pdf support, I would probably go with something like:

# what this PR has
xgb = XGBoostLSS(**params)
mms = MinMaxScaler((0.1, 0.9))
dt = DifferentiableTransformer(transformer=mms)
pipe = TransformedTargetRegressor(regressor=xgb, transformer=dt)

or

# we could easily add "dft" with the current PR
xgb = XGBoostLSS(**params)
dft = DifferentiableFuncTransformer(func, inverse_func, func_diff, inverse_diff)
pipe = TransformedTargetRegressor(regressor=xgb, transformer=dft)

I think the main issue here is extending/changing the TTR input options, e.g. point 3 above. For example, the _fit_with_fitted_ is only used to accommodate patterns unlike the above examples.

If transformer, I am not in favour strongly of either option. If a function, I would favour an skbase implementation strongly, because it is too far from the sklearn transformer.

I would keep it inheriting from sklearn for now then. My understanding is it should be straightforward to replace this with skbase if/when needed.

This implementation can pass a function

I like this, though I still need time to think about it too.

It would be good to get some more thoughts on this. As mentioned above, I think this is the main constraint in how this gets implemented.

Should we consider numerical differentiation as approx?

Where?

In the pdf and log_pdf, when applying the Jacobian. The plain pdf and log_pdf are always exact for the transformed distribution. For the original distribution, the pdf and log_pdf are always exact when using scale_ or inverse_transform_diff for the Jacobian. When using _numerical_diff, the pdf and log_pdf are still exact, but the Jacobian is technically an approximation. Therefore, in this case, I have set the tag as "approx". However, for all well-behaved functions (which is a theoretical prerequisite), numerical differentiation would effectively give an exact Jacobian (to some some relatively small rounding error). Do we wish to keep this as "approx" or change it to "exact"?

Currently, multivariate targets are handled completely independently,

I assumed that this was fully supported. Until the these other parts are implemented it will be difficult to design around it. I would consider this a edge-case - sklearn transformers don't even handle combined column transformations (afaik).

fkiraly · 2025-10-19T18:46:06Z

ok, I also think that transformer is the better way to go after some thinking.

How do we proceed practically - do you want to give it a stab and then we maybe refine?

…llback

joshdunnlime · 2025-11-24T23:49:09Z

@fkiraly - I have removed any additional packages and the differentiation is now done by scipy with both the old and new derivative APIs implemented.

The transformer implementation we spoke about is implemented and now just needs refining. I have moved some methods down to the DiffT class so as to keep the BaseT as generic as possible.

fkiraly · 2025-11-24T23:59:12Z

Nice! Will review in the coming days - this is much appreciated, but I think I need more time to digest the API design.
(please ping if I need more than a few days)

joshdunnlime · 2025-11-28T21:35:21Z

Nice! Will review in the coming days - this is much appreciated, but I think I need more time to digest the API design. (please ping if I need more than a few days)

@fkiraly - just bringing this one your attention again. Thanks

joshdunnlime added 23 commits September 25, 2025 22:24

feat: allows n_trails=0 for fast optimisation free training

01fbf69

This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.

Merge branch 'xgboostlss-no-trails' into ttr-logloss

4146a6f

docs: Add notebook to show jacobian calcs

8612b11

This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.

test: fix TransformdDistribution test with np.exp as inverse_func

31fb943

Revert "Merge branch 'xgboostlss-no-trails' into ttr-logloss"

b6e067f

This reverts commit 4146a6f, reversing changes made to 8bc5897.

docs: fix doctest example for TransformedDistribution

75561b0

feat: allows use of user defined tranform_diff function on an skpro…

c139348

… transformer

feat: add DifferentiableTransformer and base classes

17fe6ef

feat: Implement more of the sklearn -> skpro transformer logic

2f64914

add pdf and keep log_pdf

1ff060a

remove sample notebook

435dae7

add pdf

f0ae44c

feat: add approximation of the pdf and log_pdf functions

939a191

feat: add the _pdf and _log_pdf methods with index and column info

4358e25

fix: change TranformedDistribution measuretype to mixed

ce5151d

feat: add index and column information to internal distribution

9e04e80

This is needed so that when TransformerDistribtution.distribution is called, it has the same indices as the wrapper distribution.

fix: add pdf and log_pdf to approx to pass pdf_log_pdf test

c3980d1

feat: add the DifferentiableTransformer and parent classes

5bd9a99

replace transform function with DifferentiableTransformer class

f272622

use the distribution pdf and log_pdf method (remove scipy workaround)

9ea1db2

Merge branch 'ttr-logloss' into differentiable-transformer

719c31f

joshdunnlime requested review from SaiRevanth25, felipeangelimvieira and fkiraly as code owners October 5, 2025 17:18

reinstate try-except in fit_with_fitted

97ba087

joshdunnlime mentioned this pull request Oct 6, 2025

Base and Differentiable Transformer #608

Closed

5 tasks

add inverse function to init

abd27db

joshdunnlime added 2 commits October 6, 2025 13:16

allow exact cdf

36bf271

default approx (assuming no inverse)

3a689b7

joshdunnlime mentioned this pull request Oct 6, 2025

&joshdunnlime [ENH] TransformedDistribution and TransformedTargetRegressor cdf support #611

Merged

joshdunnlime and others added 3 commits October 6, 2025 15:48

Merge branch 'main' into differentiable-transformer

cc4cd21

Merge branch 'main' into differentiable-transformer

3d60ea8

Merge branch 'main' into pr/612

c44abbc

joshdunnlime added 4 commits October 12, 2025 00:26

feat: add simple and numdifftools differentiation

2de62a6

Merge branch 'differentiable-transformer' of https://github.com/joshd…

d754fb9

…unnlime/skpro into differentiable-transformer

improve approx and exact tags and indices None handling

bc21c7e

feat: add correct tag logic to manage exact and approx scenarios

96bfd12

Here we consider the numerical differentiation to classed as approx. For wellbehaved function (which change-of-variables requires), this is likely to be overkill.

joshdunnlime mentioned this pull request Oct 13, 2025

[ENH] TransformedDistribution handling of index passing in subsetting #617

Open

joshdunnlime added 3 commits November 24, 2025 21:45

fix: refactor to remove numdifftools and use scipy derviative with fa…

cbf0567

…llback

Merge branch 'main' into differentiable-transformer

b233729

fix: fix new scipy derivative usage

1d117fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 7, 2025

Uh oh!

joshdunnlime commented Oct 13, 2025

Uh oh!

fkiraly commented Oct 14, 2025

Uh oh!

fkiraly commented Oct 14, 2025

Uh oh!

joshdunnlime commented Oct 19, 2025 •

edited

Loading

Uh oh!

fkiraly commented Oct 19, 2025

Uh oh!

joshdunnlime commented Nov 24, 2025

Uh oh!

fkiraly commented Nov 24, 2025

Uh oh!

joshdunnlime commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

Are you sure you want to change the base?

[ENH] Add Differentiable Transformer and Exact PDFs for TargetTransformRegressor #612

Uh oh!

Conversation

joshdunnlime commented Oct 5, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

For all contributions

For new estimators

Uh oh!

joshdunnlime commented Oct 5, 2025

Uh oh!

fkiraly commented Oct 7, 2025

Uh oh!

joshdunnlime commented Oct 13, 2025

Uh oh!

fkiraly commented Oct 14, 2025

Uh oh!

fkiraly commented Oct 14, 2025

Uh oh!

joshdunnlime commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Oct 19, 2025

Uh oh!

joshdunnlime commented Nov 24, 2025

Uh oh!

fkiraly commented Nov 24, 2025

Uh oh!

joshdunnlime commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joshdunnlime commented Oct 19, 2025 •

edited

Loading