Skip to content

Conversation

@joshdunnlime
Copy link
Contributor

Reference Issues/PRs

#601
Problem Summary: LogLoss not included due to lack of a log_pdf. It can be computed but requires calculating the Jacobian of the transform. For many transformers this is non-trivial.

What does this implement/fix? Explain your changes.

I have implemented several pieces here. They breakdown as follows:

  1. Add log_pdf method to TTR.
  2. Add _jacobian method to TransformedDistribution (TD).
  3. Replace transform method/function with transformer class in the TD.
  4. Add ordered_gradient function to handle gradient(f, x) where the arrays are not sorted by x.

Does your contribution introduce a new dependency? If yes, which one?

No.

What should a reviewer concentrate their feedback on?

  1. Correctness of numerical calculations in log_pdf in TTR.
  2. Edge-case of the Jacobian on TD.
  3. Does replacing transform with transformer have any unintended consequences or cause a considerable breaking change?

Did you add any tests for the change?

I haven't yet added tests. I have modified the get_test_params to take a transformer instead of a single function.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the skpro root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

…ion.

This allows us to get more information from the tranformer class. E.g. MinMaxScaler scale_ parameter.

Though not part of the typical forward-facing interface, TransformedDistribution.transform is now no longer callable. Instead TransformedDistribution.transform.inverse_transform would be needed.
This is useful when testing XGBoostLLS and hyperparameter optimisation is not important. XGBoostLLS can be computationally expensive when searching over the default 30 trials.
Does so by adding the log_pdf with change-of-variables (jacobian of inverse transform) and direct method for linear transforms, and numerical method for non-linear transforms.

TransformedDistribution now take transformer instead of transform - that is, the transformer class instead of just the inverse transform function/method.
This is not actually intended as an example. It is a simple way to share findings. This can (should) be removed before any final merges into main.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

self,
distribution,
transform,
transformer,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, we cannot change the names of already released arguments, or it might break user code. Not without deprecation and warnings ate least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I'll make some changes.

)

trafo = self.transform
trafo = self.transformer.inverse_transform
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we changing this?

Copy link
Contributor Author

@joshdunnlime joshdunnlime Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self.transform is just the inverse function at present, not the whole tranformer class. The idea was to expose the whole class so we can do numerical differentiation or access the scale_ attr. Based on the comment above this isn't favourable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not easily possible, because the __init__ resets every estimator that is a component to a pre-fitted state on construction, and because of deprecation.

Overall, I think it is a very good idea, though one may have to find the right implementation.

Would passing inverse_transform as an optional additional argument work?

Alternatively, we have to pass something like sklearn FrozenEstimator wrapping the transformer, though it feels like a bit of an overhead software engineering wise.
The general question is to whether, and if yes how, to allow, API-wise, fitted estimators to be passed to __init__ - currently this violates the scikit-learn-like design.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Though we cannot merge this as is, since:

  • this changes the arguments of existing released classes, possibly breaking user code
  • I think it introduces too much coupling between TTR and the TransformedDistribution, we should avoid that.

Suggestion: how about we allow the user to pass both transform and inverse_transform (optional) to the TransformedDistribution? Then we can do numerical differentiation.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I request:

  • remove the BaseTransformer from this pull request and open a separate one, since we will have some API discussions; let's aim to complete this with numerical differentiation, and later extend it with exact jacobians in a separate PR
  • may I suggest to move the numerical differentiation to TD for it to happen only inside _pdf and _log_pdf? And leave TTR untouched? This we could merge quickly, and it could be "version 0.1" of the feature you want.e
    • testing it might also highlight aspects we have not yet thought about

@joshdunnlime
Copy link
Contributor Author

Apologies, I made the DiffTransformer changes on the wrong branch.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 3, 2025

No problem.

May I suggest to change this PR to revert changes across the classes, and add only an approximate log_pdf and pdf to the TD?

That would be a good starting point for adding more features in separate PR (e.g., autodiff-like).

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - almost ready to merge. The public methods should not be overridden, private methods should be implemented.

See how, for instance, _cdf is implemented.

else:
raise NotImplementedError

jac = np.abs(self.transformer_.inverse_transform_diff(x))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this we should replace with an approximate jacobian

@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Oct 5, 2025

I have implemented the _pdf and _log_pdf so that they give approximations (e.g. not including the Jacobian). We could try to approximate the Jacobian but I think it is better to compute it directly (as per our discussions in the this PR and the other PR).

This PR could be merged but it does have to potential to return very approximate pdf and log_pdf results. My suggestion would instead be to make the DifferentiableTransformer or DifferentiableFunction changes on a branch from this (branch stacking) and merge it all at once.

Notes for reviewer/@fkiraly :

I have added some warnings to inform the user that these are approximations. I have added this to the docs as well. This can be removed when we implement the Jacobian.

When changing pdf and log_pdf to the private functions I ran into some issues. The first was that the internal (non-transformed) distribution in TTR didn't have the same index and columns as TTR. This caused inf to be returned. Setting the index and columns during TTR training fixed this. This explains the changes to ttr.

The seconds issue that was rather subtle was "distr:measuretype": "discrete" was causing the _pdf and _log_pdf calls to be ignored. I have changed this to "mixed". I assume this is correct (given the three choices). I'm not sure if the API allows this but perhaps setting the TTR "distr:measuretype" based on the distribution it is wrapping would be beneficial.

Finally, to pass the "pdf_log_pdf" tests, these had to be added to the approx tags.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

I have added some warnings to inform the user that these are approximations.

Yes, good thinking - this is consistent with the default approximations in BaseDistribution, if any are called the user also receives warnings about those.

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Finally, to pass the "pdf_log_pdf" tests, these had to be added to the approx tags

Well, looks like the tests were doing their job.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions which may imply changed requests:

  • why are we doing the branch "if hasattr(dist, "_distribution_attr")? This feels like a violation of the law of demeter. We should only use dist.pdf.
  • the wrong indices are explained by this, I think. You should not need to correct them if all we do is call pdf.dist.

Can you also explain to me how the computation is an approximation to transformed pdf? It is missing the jacobian.

@joshdunnlime
Copy link
Contributor Author

Can you also explain to me how the computation is an approximation to transformed pdf? It is missing the jacobian.

I guess that depends on the definition of approximate - I might have been somewhat loose with it. Refer to my sentence from above:

This PR could be merged but it does have the potential to return very approximate pdf and log_pdf results. My suggestion would instead be to make the DifferentiableTransformer or DifferentiableFunction changes on a branch from this (branch stacking) and merge it all at once.

I recommend not merging this independently but waiting until the we have a solution for the exact Jacobian in place. My suggestion is to use branch stacking to implement the DiffT or DiffFunc classes. I have this done already and will open a new PR (in place of #608).

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Actually, if it was "discrete", the code in __init__ should set it to "discrete" via set_tags. Otherwise no setting needs to be done, it should be "mixed".

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

I guess that depends on the definition of approximate - I might have been somewhat loose with it.

I was under the impression you were going to implement numerical derivatives.

I have now made a draft here in the base class: #610

This is due to realizing that this is nothing specific to the transformed distribution - any distribution that does not have a pdf implemented but a cdf can produce a numerical pdf like this.

This should, in particular, fix it for the TransformedDistribution - do you want to give it a test?

@joshdunnlime
Copy link
Contributor Author

  • why are we doing the branch "if hasattr(dist, "_distribution_attr")? This feels like a violation of the law of demeter. We should only use dist.pdf.
  • the wrong indices are explained by this, I think. You should not need to correct them if all we do is call pdf.dist.

Perhaps a misunderstanding on my behalf. This is now fixed.

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

@joshdunnlime
Copy link
Contributor Author

I assume this is correct (given the three choices)

Yes, I think that is the correct choice to hard code, it is the most general class. We can dynamically change this later if we want to.

Actually, if it was "discrete", the code in __init__ should set it to "discrete" via set_tags. Otherwise no setting needs to be done, it should be "mixed".

It was discrete so I changed it explicitly to mixed. It seems to inherit mixed from the base class. Should I remove it?

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

It was discrete so I changed it explicitly to mixed. It seems to inherit mixed from the base class. Should I remove it?

I would say:

  • leave for class at "mixed"
  • in __init__, set to "discrete" if and only the inner distribution is "discrete".

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

What are the indices currently? I think they should be the same between row and column index.

@joshdunnlime
Copy link
Contributor Author

joshdunnlime commented Oct 5, 2025

I guess that depends on the definition of approximate - I might have been somewhat loose with it.

OK - some confusion around approximate here:

  1. Approx in

I was under the impression you were going to implement numerical derivatives.

Yes, but I am also adding explicit differentiation in two cases: 1) the user passes the explicit inverse_func_diff or 2) a scaler transform is used.

In these cases, the Jacobian is exact and therefore so is the log_pdf.

I have now made a draft here in the base class: #610

This is due to realizing that this is nothing specific to the transformed distribution - any distribution that does not have a pdf implemented but a cdf can produce a numerical pdf like this.

Understood, but the TD doesn't have a cdf so I'm not sure this helps with the LogLoss issue.

(edit) or TD is this implemented via the BaseDistribution cdf?

This should, in particular, fix it for the TransformedDistribution - do you want to give it a test?

Yes - happy to test. It will be interesting to compare this to the exact methods above.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

Yes, but I am also adding explicit differentiation in two cases: 1) the user passes the explicit inverse_func_diff or 2) a scaler transform is used.

In these cases, the Jacobian is exact and therefore so is the log_pdf.

I know, but I think we need to add these gradually. Numerical differentiation is a sensible default in the absence of anything else.

Therefore imo it makes sense to put it even in the base class as a the last fallback.

Understood, but the TD doesn't have a cdf so I'm not sure this helps with the LogLoss issue.

Oh, I see. You would need the inverse transformation for that.

@joshdunnlime
Copy link
Contributor Author

Would we like to rollback the changes to ttr or are these beneficial? E.g. is a user wants to dig into the internals of TTR/TD the indices are shared between TD and the wrapped distribution.

What are the indices currently? I think they should be the same between row and column index.

Prior to the change, always a pandas RangeIndex for the columns. I have named columns so it was raising an index error like "col not found in y.columns".

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

Prior to the change, always a pandas RangeIndex for the columns. I have named columns so it was raising an index error like "col not found in y.columns".

Is this a bug in the wrapped regressor then? It should return index and columns in line with the X and y of the predict.

@fkiraly
Copy link
Collaborator

fkiraly commented Oct 5, 2025

I have now opened an additional PR #611, this adds an exact cdf for the TD using the transform of the distribution.

Combined with this, #610 should be able to produce a reasonable pdf through numerical differentiation.

The sequence of computations would be, when calling pdf:

  • the TTR passes the inner transforms fitted transform to the TD inverse_transform
  • an exact cdf is computed using the inverse_transform argument that is passed (the TTR always has this because the inverse of the inverse is simply transform)
  • finally, the base class default of pdf does numerical differentiation on cdf, calling it multiple times

(the user should see nothing of this, optimally)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants