Operators

Tip

If there are some concepts you do not understand, take a look at the book The Elements of Differentiable Programming (Blondel and Roulet, 2024).

List of operators

Given a function f(x) = y, there are several differentiation operators available. The terminology depends on:

  • the type and shape of the input x
  • the type and shape of the output y
  • the order of differentiation

Below we list and describe all the operators we support.

Warning

The package is thoroughly tested with inputs and outputs of the following types: Float64, Vector{Float64} and Matrix{Float64}. We also expect it to work on most kinds of Number and AbstractArray variables. Beyond that, you are in uncharted territory. We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs might work in some cases, but we make no guarantees about that yet.

High-level operators

These operators are computed using only the input x.

operatororderinput xoutput yoperator result typeoperator result shape
derivative1NumberAnysimilar to ysize(y)
second_derivative2NumberAnysimilar to ysize(y)
gradient1AnyNumbersimilar to xsize(x)
jacobian1AbstractArrayAbstractArrayAbstractMatrix(length(y), length(x))
hessian2AbstractArrayNumberAbstractMatrix(length(x), length(x))

Low-level operators

These operators are computed using the input x and another argument t of type NTuple, which contains one or more tangents. You can think of tangents as perturbations propagated through the function; they live either in the same space as x or in the same space as y.

operatororderinput xoutput yelement type of toperator result typeoperator result shape
pushforward (JVP)1AnyAnysimilar to xsimilar to ysize(y)
pullback (VJP)1AnyAnysimilar to ysimilar to xsize(x)
hvp2AnyNumbersimilar to xsimilar to xsize(x)

Variants

Several variants of each operator are defined:

  • out-of-place operators return a new derivative object
  • in-place operators mutate the provided derivative object
out-of-placein-placeout-of-place + primalin-place + primal
derivativederivative!value_and_derivativevalue_and_derivative!
second_derivativesecond_derivative!value_derivative_and_second_derivativevalue_derivative_and_second_derivative!
gradientgradient!value_and_gradientvalue_and_gradient!
hessianhessian!value_gradient_and_hessianvalue_gradient_and_hessian!
jacobianjacobian!value_and_jacobianvalue_and_jacobian!
pushforwardpushforward!value_and_pushforwardvalue_and_pushforward!
pullbackpullback!value_and_pullbackvalue_and_pullback!
hvphvp!--

Mutation and signatures

Two kinds of functions are supported:

  • out-of-place functions f(x) = y
  • in-place functions f!(y, x) = nothing
Warning

In-place functions only work with pushforward, pullback, derivative and jacobian. The other operators hvp, gradient and hessian require scalar outputs, so it makes no sense to mutate the number y.

This results in various operator signatures (the necessary arguments and their order):

function signatureout-of-place operator (returns result)in-place operator (mutates result)
out-of-place function fop(f, backend, x, [t])op!(f, result, backend, x, [t])
in-place function f!op(f!, y, backend, x, [t])op!(f!, y, result, backend, x, [t])
Warning

The positional arguments between f/f! and backend are always mutated, regardless of the bang ! in the operator name. In particular, for in-place functions f!(y, x), every variant of every operator will mutate y.

Preparation

Principle

In many cases, AD can be accelerated if the function has been called at least once (e.g. to record a tape) or if some cache objects are pre-allocated. This preparation procedure is backend-specific, but we expose a common syntax to achieve it.

operatorpreparation (different point)preparation (same point)
derivativeprepare_derivative-
gradientprepare_gradient-
jacobianprepare_jacobian-
second_derivativeprepare_second_derivative-
hessianprepare_hessian-
pushforwardprepare_pushforwardprepare_pushforward_same_point
pullbackprepare_pullbackprepare_pullback_same_point
hvpprepare_hvpprepare_hvp_same_point

In addition, the preparation syntax depends on the number of arguments accepted by the function.

function signaturepreparation signature
out-of-place functionprepare_op(f, backend, x, [t])
in-place functionprepare_op(f!, y, backend, x, [t])

Preparation creates an object called prep which contains the the necessary information to speed up an operator and its variants. The idea is that you prepare only once, which can be costly, but then call the operator several times while reusing the same prep.

op(f, backend, x, [t])  # slow because it includes preparation
op(f, prep, backend, x, [t])  # fast because it skips preparation
Warning

The prep object is the last argument before backend and it is always mutated, regardless of the bang ! in the operator name.

Reusing preparation

Deciding whether it is safe to reuse the results of preparation is not easy. Here are the general rules that we strive to implement:

For different-point preparation, the output prep of prepare_op(f, b, x, [t]) can be reused in op(f, prep, b, other_x, [other_t]), provided that:

  • the inputs x and other_x have the same types and sizes
  • the tangents in t and other_t have the same types and sizes

For same-point preparation, the output prep of prepare_op_same_point(f, b, x, [t]) can be reused in op(f, prep, b, x, other_t), provided that:

  • the input x remains exactly the same (as well as any Constant context)
  • the tangents in t and other_t have the same types and sizes
Warning

These rules hold for the majority of backends, but there are some exceptions. The most important exception is ReverseDiff and its taping mechanism, which is sensitive to control flow inside the function.