double(x) = 2x
∂(::typeof(double)) = x -> (v -> 2v) # independent from x
Julia’s most confusing superpower?
2024-10-29
https://gdalle.github.io/JuliaOptimizationDays2024-AutoDiff/
What is a derivative?
A linear approximation of a function around a point.
Why do we care?
Derivatives of complex programs are essential in optimization and machine learning.
What do we need to do?
Not much: Automatic Differentiation (AD) computes derivatives for us!
Derivative of \(f\) at point \(x\): linear map \(\partial f(x)\) such that \[f(x + v) = f(x) + \partial f(x)[v] + o(\lVert v \rVert)\]
In other words,
\[\partial f(x)[v] = \lim_{\varepsilon \to 0} \frac{f(x + \varepsilon v) - f(x)}{\varepsilon}\]
Any derivative can be obtained from:
\[\partial (f \circ g)(x) = \partial f(g(x)) \circ \partial g(x)\]
or its adjoint1
\[\partial (f \circ g)^*(x) = \partial g(x)^* \circ \partial f(g(x))^*\]
Basic functions
Chain rule
Let’s try it out
We could multiply matrices instead of composing linear maps:
\[J_{f \circ g}(x) = J_f(g(x)) \cdot J_g(x)\]
where the Jacobian matrix is
\[J_f(x) = \left(\partial f_i / \partial x_j\right)_{i,j}\]
We don’t need Jacobian matrices as long as we can compute their products with vectors:
Jacobian-vector products
\[J_{f}(x) v = \partial f(x)[v]\]
Propagate a perturbation \(v\) from input to output
Vector-Jacobian products
\[w^\top J_{f}(x) = \partial f(x)^*[w]\]
Backpropagate a sensitivity \(w\) from output to input
Consider \(f = f_L \circ \dots \circ f_1\) and its Jacobian \(J = J_L \cdots J_1\).
Jacobian-vector products decompose from layer \(1\) to layer \(L\):
\[J_L(J_{L-1}(\dots \underbrace{J_2(\underbrace{J_1 v}_{v_1})}_{v_2}))\]
Forward mode AD relies on the chain rule.
Consider \(f = f_L \circ \dots \circ f_1\) and its Jacobian \(J = J_L \cdots J_1\).
Vector-Jacobian products decompose from layer \(L\) to layer \(1\):
\[((\underbrace{(\underbrace{w^\top J_L}_{w_L}) J_{L-1}}_{w_{L-1}} \dots ) J_2 ) J_1\]
Reverse mode AD relies on the adjoint chain rule.
Consider \(f : \mathbb{R}^n \to \mathbb{R}^m\). How to recover the full Jacobian?
Forward mode
Column by column:
\[J = \begin{pmatrix} J e_1 & \dots & J e_n \end{pmatrix}\]
where \(e_i\) is a basis vector.
Reverse mode
Row by row:
\[J = \begin{pmatrix} e_1^\top J \\ \vdots \\ e_m^\top J \end{pmatrix}\]
Consider \(f : \mathbb{R}^n \to \mathbb{R}^m\). How much does a Jacobian cost?
Theorem
Each JVP or VJP takes as much time and space as \(O(1)\) calls to \(f\).
sizes | jacobian | forward | reverse | best mode |
---|---|---|---|---|
generic | jacobian | \(O(n)\) | \(O(m)\) | depends |
\(n = 1\) | derivative | \(O(1)\) | \(O(m)\) | forward |
\(m = 1\) | gradient | \(O(n)\) | \(O(1)\) | reverse |
Fast reverse mode gradients make deep learning possible.
Image: courtesy of Adrian Hill
Image: courtesy of Adrian Hill
Full list available at https://juliadiff.org/.
MethodError: MethodError(Float64, (Dual{ForwardDiff.Tag{typeof(Main.Notebook.badcopy), Float64}}(1.0,1.0,0.0),), 0x0000000000007b14)
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2})
Closest candidates are:
(::Type{T})(::Real, !Matched::RoundingMode) where T<:AbstractFloat
@ Base rounding.jl:207
(::Type{T})(::T) where T<:Number
@ Core boot.jl:792
Float64(!Matched::IrrationalConstants.Inv2π)
@ IrrationalConstants ~/.julia/packages/IrrationalConstants/vp5v4/src/macro.jl:112
...
Stacktrace:
[1] convert(::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2})
@ Base ./number.jl:7
[2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}, i1::Int64)
@ Base ./array.jl:1021
[3] _unsafe_copyto!(dest::Vector{Float64}, doffs::Int64, src::Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}}, soffs::Int64, n::Int64)
@ Base ./array.jl:299
[4] unsafe_copyto!
@ ./array.jl:353 [inlined]
[5] _copyto_impl!
@ ./array.jl:376 [inlined]
[6] copyto!
@ ./array.jl:363 [inlined]
[7] copyto!
@ ./array.jl:385 [inlined]
[8] badcopy(x::Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}})
@ Main.Notebook ~/work/JuliaOptimizationDays2024-AutoDiff/JuliaOptimizationDays2024-AutoDiff/index.qmd:327
[9] vector_mode_dual_eval!
@ ~/.julia/packages/ForwardDiff/PcZ48/src/apiutils.jl:24 [inlined]
[10] vector_mode_jacobian(f::typeof(badcopy), x::Vector{Float64}, cfg::ForwardDiff.JacobianConfig{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}}})
@ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:125
[11] jacobian(f::Function, x::Vector{Float64}, cfg::ForwardDiff.JacobianConfig{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}}}, ::Val{true})
@ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:21
[12] jacobian(f::Function, x::Vector{Float64}, cfg::ForwardDiff.JacobianConfig{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2, Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(badcopy), Float64}, Float64, 2}}})
@ ForwardDiff ~/.julia/packages/ForwardDiff/PcZ48/src/jacobian.jl:19
[13] top-level scope
@ ~/work/JuliaOptimizationDays2024-AutoDiff/JuliaOptimizationDays2024-AutoDiff/index.qmd:329
Allow numbers of type Dual
to pass through your functions.
ErrorException: ErrorException("Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)\nThis error occurs when you ask Zygote to differentiate operations that change\nthe elements of arrays in place (e.g. setting values with x .= ...)\n\nPossible fixes:\n- avoid mutating operations (preferred)\n- or read the documentation and solutions for this error\n https://fluxml.ai/Zygote.jl/latest/limitations\n")
Mutating arrays is not supported -- called copyto!(Vector{Float64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)
Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
https://fluxml.ai/Zygote.jl/latest/limitations
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] _throw_mutation_error(f::Function, args::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/array.jl:70
[3] (::Zygote.var"#547#548"{Vector{Float64}})(::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/array.jl:85
[4] (::Zygote.var"#2633#back#549"{Zygote.var"#547#548"{Vector{Float64}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72
[5] badcopy
@ ~/work/JuliaOptimizationDays2024-AutoDiff/JuliaOptimizationDays2024-AutoDiff/index.qmd:327 [inlined]
[6] (::Zygote.Pullback{Tuple{typeof(badcopy), Vector{Float64}}, Tuple{Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.var"#2633#back#549"{Zygote.var"#547#548"{Vector{Float64}}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
[7] (::Zygote.var"#294#295"{Tuple{Tuple{Nothing}}, Zygote.Pullback{Tuple{typeof(badcopy), Vector{Float64}}, Tuple{Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.var"#2633#back#549"{Zygote.var"#547#548"{Vector{Float64}}}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/lib.jl:206
[8] (::Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing}}, Zygote.Pullback{Tuple{typeof(badcopy), Vector{Float64}}, Tuple{Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.ZBack{Returns{Tuple{ChainRulesCore.NoTangent, ChainRulesCore.NoTangent}}}, Zygote.var"#2633#back#549"{Zygote.var"#547#548"{Vector{Float64}}}}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72
[9] call_composed
@ ./operators.jl:1045 [inlined]
[10] (::Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Any})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
[11] call_composed
@ ./operators.jl:1044 [inlined]
[12] #_#103
@ ./operators.jl:1041 [inlined]
[13] (::Zygote.Pullback{Tuple{Base.var"##_#103", @Kwargs{}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}}, Tuple{Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, Zygote.var"#2013#back#207"{typeof(identity)}}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(Zygote._jvec)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(Zygote._jvec)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:outer, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(Zygote._jvec)}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(badcopy)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(badcopy)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:inner, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(badcopy)}}}}, Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(Zygote._jvec), typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Any}, Zygote.Pullback{Tuple{typeof(Zygote._jvec), Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(vec), Vector{Float64}}, Tuple{}}}}, Zygote.var"#2029#back#216"{Zygote.var"#back#214"{2, 1, Zygote.Context{false}, typeof(Zygote._jvec)}}, Zygote.var"#2141#back#284"{Zygote.var"#280#283"}}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
[14] #294
@ ~/.julia/packages/Zygote/NRp5C/src/lib/lib.jl:206 [inlined]
[15] #2169#back
@ ~/.julia/packages/ZygoteRules/M4xmc/src/adjoint.jl:72 [inlined]
[16] ComposedFunction
@ ./operators.jl:1041 [inlined]
[17] (::Zygote.Pullback{Tuple{ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, Vector{Float64}}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.var"#2366#back#423"{Zygote.var"#pairs_namedtuple_pullback#422"{(), @NamedTuple{}}}, Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing, Nothing}, Tuple{Nothing}}, Zygote.Pullback{Tuple{Base.var"##_#103", @Kwargs{}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}}, Tuple{Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, Zygote.var"#2013#back#207"{typeof(identity)}}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(Zygote._jvec)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(Zygote._jvec)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:outer, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(Zygote._jvec)}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(badcopy)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(badcopy)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:inner, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(badcopy)}}}}, Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(Zygote._jvec), typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Any}, Zygote.Pullback{Tuple{typeof(Zygote._jvec), Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(vec), Vector{Float64}}, Tuple{}}}}, Zygote.var"#2029#back#216"{Zygote.var"#back#214"{2, 1, Zygote.Context{false}, typeof(Zygote._jvec)}}, Zygote.var"#2141#back#284"{Zygote.var"#280#283"}}}}}}}, Zygote.Pullback{Tuple{Type{NamedTuple}}, Tuple{}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface2.jl:0
[18] (::Zygote.var"#78#79"{Zygote.Pullback{Tuple{ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, Vector{Float64}}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.var"#2366#back#423"{Zygote.var"#pairs_namedtuple_pullback#422"{(), @NamedTuple{}}}, Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing, Nothing}, Tuple{Nothing}}, Zygote.Pullback{Tuple{Base.var"##_#103", @Kwargs{}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}}, Tuple{Zygote.var"#2169#back#296"{Zygote.var"#294#295"{Tuple{Tuple{Nothing}, Tuple{Nothing}}, Zygote.var"#2013#back#207"{typeof(identity)}}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(Zygote._jvec)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(Zygote._jvec)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:outer, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(Zygote._jvec)}}, Zygote.Pullback{Tuple{typeof(Base.unwrap_composed), typeof(badcopy)}, Tuple{Zygote.var"#2013#back#207"{typeof(identity)}, Zygote.Pullback{Tuple{typeof(Base.maybeconstructor), typeof(badcopy)}, Tuple{}}}}, Zygote.var"#2180#back#306"{Zygote.var"#back#305"{:inner, Zygote.Context{false}, ComposedFunction{typeof(Zygote._jvec), typeof(badcopy)}, typeof(badcopy)}}}}, Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(Zygote._jvec), typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Tuple{Zygote.Pullback{Tuple{typeof(Base.call_composed), Tuple{typeof(badcopy)}, Tuple{Vector{Float64}}, @Kwargs{}}, Any}, Zygote.Pullback{Tuple{typeof(Zygote._jvec), Vector{Float64}}, Tuple{Zygote.Pullback{Tuple{typeof(vec), Vector{Float64}}, Tuple{}}}}, Zygote.var"#2029#back#216"{Zygote.var"#back#214"{2, 1, Zygote.Context{false}, typeof(Zygote._jvec)}}, Zygote.var"#2141#back#284"{Zygote.var"#280#283"}}}}}}}, Zygote.Pullback{Tuple{Type{NamedTuple}}, Tuple{}}}}})(Δ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/compiler/interface.jl:91
[19] withjacobian(f::Function, args::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/grad.jl:150
[20] jacobian(f::Function, args::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/NRp5C/src/lib/grad.jl:128
[21] top-level scope
@ ~/work/JuliaOptimizationDays2024-AutoDiff/JuliaOptimizationDays2024-AutoDiff/index.qmd:355
Define a custom rule with ChainRulesCore:
The fine print
DI may be slower than a direct call to the package’s API (mostly with Enzyme).
f(x, args...)
f!(y, x, args...)
derivative
, gradient
, jacobian
and hessian
SecondOrder
DifferentiateWith
AutoSparse
If two Jacobian columns don’t overlap:
\[J = \begin{pmatrix} 1 & \cdot & 5 \\ \cdot & 3 & 6 \\ 2 & \cdot & 7 \\ \cdot & 4 & 8 \end{pmatrix} \quad \implies \quad J(e_1 + e_2) \text{ and } J e_3\]
Find which coefficients might be nonzero.
Trace dependencies on inputs during function execution.
Split columns into non-overlapping groups.
Already used by SciML and JuliaSmoothOptimizers.
Computing derivatives is automatic and efficient.
Each AD system comes with limitations.
Learn to recognize and overcome them.