Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/AD.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ jobs:
- Mooncake
- Tracker
- ReverseDiff
- Zygote
steps:
- uses: actions/checkout@v4

Expand Down
4 changes: 0 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ LazyArrays = "5078a376-72f3-5289-bfd5-ec5146d43c02"
Mooncake = "da2b9cff-9c12-43a0-ae48-6db2b0edb7d6"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Tracker = "9f7883ad-71c0-57eb-9f7f-b5c9e6d3789c"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[extensions]
BijectorsDistributionsADExt = "DistributionsAD"
Expand All @@ -40,7 +39,6 @@ BijectorsMooncakeExt = "Mooncake"
BijectorsReverseDiffExt = "ReverseDiff"
BijectorsReverseDiffChainRulesExt = ["ChainRules", "ReverseDiff"]
BijectorsTrackerExt = "Tracker"
BijectorsZygoteExt = "Zygote"

[compat]
ArgCheck = "1, 2"
Expand All @@ -64,7 +62,6 @@ ReverseDiff = "1"
Roots = "1.3.15, 2"
Statistics = "1"
Tracker = "0.2"
Zygote = "0.6.63, 0.7"
julia = "1.10.8"

[extras]
Expand All @@ -75,4 +72,3 @@ LazyArrays = "5078a376-72f3-5289-bfd5-ec5146d43c02"
Mooncake = "da2b9cff-9c12-43a0-ae48-6db2b0edb7d6"
ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
Tracker = "9f7883ad-71c0-57eb-9f7f-b5c9e6d3789c"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
4 changes: 1 addition & 3 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,8 @@
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
Documenter = "0.27"
Functors = "0.3"
StableRNGs = "1"
Zygote = "0.6"
StableRNGs = "1"
4 changes: 2 additions & 2 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ y = rand(rng, td)
Want to fit the flow?

```@repl normalizing-flows
using Zygote
using ForwardDiff

# Construct the flow.
b = PlanarLayer(2)
Expand Down Expand Up @@ -145,7 +145,7 @@ f = NLLObjective(reconstruct, MvNormal(2, 1), xs);
# Train using gradient descent.
ε = 1e-3;
for i in 1:100
(∇s,) = Zygote.gradient(f, θs)
∇s = ForwardDiff.gradient(θ -> f(θ), θs)
θs = fmap(θs, ∇s) do θ, ∇
θ - ε .* ∇
end
Expand Down
198 changes: 0 additions & 198 deletions ext/BijectorsZygoteExt.jl

This file was deleted.

2 changes: 1 addition & 1 deletion src/bijectors/pd.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
struct PDBijector <: Bijector end

# This function has custom adjoints defined for Tracker, Zygote and ReverseDiff.
# This function has custom adjoints defined for Tracker and ReverseDiff.
# I couldn't find a mutation-free implementation that maintains TrackedArrays in Tracker
# and ReverseDiff, hence the need for custom adjoints.
function replace_diag(f, X)
Expand Down
45 changes: 44 additions & 1 deletion src/chainrules.jl
Original file line number Diff line number Diff line change
Expand Up @@ -286,5 +286,48 @@ function ChainRulesCore.rrule(::typeof(pd_from_upper), X::AbstractMatrix)
end
end

# Fixes Zygote's issues with `@debug`
# Fixes AD issues with `@debug`
ChainRulesCore.@non_differentiable _debug(::Any)
Comment on lines +289 to 290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this is still needed? Does some other AD backend rely on this?


# ChainRules for utility functions used by PlanarLayer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these copied over from somewhere else, or new code? Do we need them? I think ChainRules rules are mostly for Zygote, although other AD packages use them too, so wondering if/why these are necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions are used by PlanarLayer (src/bijectors/planar_layer.jl). The ChainRules are necessary for proper automatic differentiation support?

Copy link
Member

@penelopeysm penelopeysm Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainRules is only directly used by Zygote, all other (major) AD systems have their own way of handling them. For example ForwardDiff and ReverseDiff use operator overloading. For some AD systems (Mooncake, Enzyme) you can import rules from ChainRules but this is an optional mechanism, because those two backends differentiate code at a lower level. (It's messy, but there's an overview of Julia AD at: https://juliadiff.org/DifferentiationInterface.jl/DifferentiationInterface/stable/faq/differentiability/) So, I'm a bit surprised that we would need to add new rules in this PR, given that we're removing Zygote. Indeed I would have been more expecting to see removal of rules.

If you remove these do any of the tests fail?

(Also, please don't resolve comments unless you're sure that it's been resolved! Otherwise it's hard for people to see what still needs to be discussed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AoifeHughes, did you a chance to check yet what happens if you remove these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm doing it right now, it still breaks :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, odd. Happy to have a look at the error and figure it out together if it helps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not gonna lie, I'm totally lost and feel like Im just randomly changing things I don't understand here. I'd really appreciate someone taking over so I can learn a bit about how best to work through the problem

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. Ping me or Penny on Slack when you have a moment and let's have a look at it together.

function ChainRulesCore.rrule(::typeof(aT_b), a::AbstractVector, b::AbstractMatrix)
y = aT_b(a, b)
function aT_b_pullback(Δy)
# For y = a' * b, we have:
# ∂y/∂a = b * Δy (since y is 1×n and a is m×1)
# ∂y/∂b = a * Δy (since y is 1×n and b is m×n)
Δa = b * vec(Δy)
Δb = a * Δy
return ChainRulesCore.NoTangent(), Δa, Δb
end
return y, aT_b_pullback
end

function ChainRulesCore.rrule(::typeof(aT_b), a::AbstractVector, b::AbstractVector)
y = aT_b(a, b)
function aT_b_pullback(Δy)
# For y = dot(a, b), we have:
# ∂y/∂a = b * Δy
# ∂y/∂b = a * Δy
return ChainRulesCore.NoTangent(), b * Δy, a * Δy
end
return y, aT_b_pullback
end

function ChainRulesCore.rrule(::typeof(_vec), x::AbstractArray)
y = _vec(x)
function _vec_pullback(Δy)
# Reshape the gradient back to the original shape
Δx = reshape(Δy, size(x))
return ChainRulesCore.NoTangent(), Δx
end
return y, _vec_pullback
end

function ChainRulesCore.rrule(::typeof(_vec), x::Real)
y = _vec(x)
function _vec_pullback(Δy)
return ChainRulesCore.NoTangent(), Δy
end
return y, _vec_pullback
end
2 changes: 0 additions & 2 deletions test/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ ReverseDiff = "37e2e3b7-166d-5795-8a7a-e32c996b4267"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Tracker = "9f7883ad-71c0-57eb-9f7f-b5c9e6d3789c"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
AbstractMCMC = "5"
Expand All @@ -50,5 +49,4 @@ Mooncake = "0.4"
ReverseDiff = "1.4.2"
StableRNGs = "1"
Tracker = "0.2.11"
Zygote = "0.6.63, 0.7"
julia = "1.10"
8 changes: 4 additions & 4 deletions test/ad/flows.jl
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
@testset "PlanarLayer" begin
# logpdf of a flow with a planar layer and two-dimensional inputs
test_ad(randn(7)) do θ
test_ad(randn(7), (:EnzymeForward,)) do θ
layer = PlanarLayer(θ[1:2], θ[3:4], θ[5:5])
flow = transformed(MvNormal(zeros(2), I), layer)
x = θ[6:7]
return logpdf(flow.dist, x) - logabsdetjac(flow.transform, x)
end
test_ad(randn(11)) do θ
test_ad(randn(11), (:EnzymeForward,)) do θ
layer = PlanarLayer(θ[1:2], θ[3:4], θ[5:5])
flow = transformed(MvNormal(zeros(2), I), layer)
x = reshape(θ[6:end], 2, :)
return sum(logpdf(flow.dist, x) - logabsdetjac(flow.transform, x))
end

# logpdf of a flow with the inverse of a planar layer and two-dimensional inputs
test_ad(randn(7)) do θ
test_ad(randn(7), (:EnzymeForward,)) do θ
layer = PlanarLayer(θ[1:2], θ[3:4], θ[5:5])
flow = transformed(MvNormal(zeros(2), I), inverse(layer))
x = θ[6:7]
return logpdf(flow.dist, x) - logabsdetjac(flow.transform, x)
end
test_ad(randn(11)) do θ
test_ad(randn(11), (:EnzymeForward,)) do θ
layer = PlanarLayer(θ[1:2], θ[3:4], θ[5:5])
flow = transformed(MvNormal(zeros(2), I), inverse(layer))
x = reshape(θ[6:end], 2, :)
Expand Down
11 changes: 0 additions & 11 deletions test/ad/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ function test_ad(f, x, broken=(); rtol=1e-6, atol=1e-6)
if !(
b in (
:ForwardDiff,
:Zygote,
:Mooncake,
:ReverseDiff,
:Enzyme,
Expand All @@ -32,16 +31,6 @@ function test_ad(f, x, broken=(); rtol=1e-6, atol=1e-6)
end
end

if AD == "All" || AD == "Zygote"
if :Zygote in broken
@test_broken Zygote.gradient(f, x)[1] ≈ finitediff rtol = rtol atol = atol
else
∇zygote = Zygote.gradient(f, x)[1]
@test (all(iszero, finitediff) && ∇zygote === nothing) ||
isapprox(∇zygote, finitediff; rtol=rtol, atol=atol)
end
end

if AD == "All" || AD == "ReverseDiff"
if :ReverseDiff in broken
@test_broken ReverseDiff.gradient(f, x) ≈ finitediff rtol = rtol atol = atol
Expand Down
Loading
Loading