Operators

DifferentiationInterface.jl is based on two concepts: operators and backends. This page is about the former, check out that page to learn about the latter.

List of operators

Given a function f(x) = y, there are several differentiation operators available. The terminology depends on:

  • the type and shape of the input x
  • the type and shape of the output y
  • the order of differentiation

Below we list and describe all the operators we support.

Tip

Read the book The Elements of Differentiable Programming for details on these concepts.

High-level operators

These operators are computed using only the input x.

operatororderinput xoutput yoperator result typeoperator result shape
derivative1NumberNumber or AbstractArraysame as ysize(y)
second_derivative2NumberNumber or AbstractArraysame as ysize(y)
gradient1AbstractArrayNumbersame as xsize(x)
jacobian1AbstractArrayAbstractArrayAbstractMatrix(length(y), length(x))
hessian2AbstractArrayNumberAbstractMatrix(length(x), length(x))

Low-level operators

These operators are computed using the input x and a tangent t of type Tangents. This tangent is essentially an NTuple, whose elements live either

  • in the same space as x (we call it tx)
  • or in the same space as y (we call it ty)
operatororderinput xoutput ytangent toperator result typeoperator result shape
pushforward (JVP)1AnyAnytxsame as ysize(y)
pullback (VJP)1AnyAnytysame as xsize(x)
hvp2AbstractArrayNumbertxsame as xsize(x)

Variants

Several variants of each operator are defined:

  • out-of-place operators return a new derivative object
  • in-place operators mutate the provided derivative object
out-of-placein-placeout-of-place + primalin-place + primal
derivativederivative!value_and_derivativevalue_and_derivative!
second_derivativesecond_derivative!value_derivative_and_second_derivativevalue_derivative_and_second_derivative!
gradientgradient!value_and_gradientvalue_and_gradient!
hessianhessian!value_gradient_and_hessianvalue_gradient_and_hessian!
jacobianjacobian!value_and_jacobianvalue_and_jacobian!
pushforwardpushforward!value_and_pushforwardvalue_and_pushforward!
pullbackpullback!value_and_pullbackvalue_and_pullback!
hvphvp!NANA

Mutation and signatures

Two kinds of functions are supported:

  • out-of-place functions f(x) = y
  • in-place functions f!(y, x) = nothing
Warning

In-place functions only work with pushforward, pullback, derivative and jacobian.

This results in various operator signatures (the necessary arguments and their order):

function signatureout-of-place operatorin-place operator
out-of-place functionop(f, backend, x, [t])op!(f, result, backend, x, [t])
in-place functionop(f!, y, backend, x, [t])op!(f!, y, result, backend, x, [t])
Warning

The positional arguments between f/f! and backend are always mutated. This convention holds regardless of the bang ! in the operator name. In particular, for in-place functions f!(y, x), every variant of every operator will mutate y.

Preparation

Principle

In many cases, AD can be accelerated if the function has been called at least once (e.g. to record a tape) or if some cache objects are provided. This preparation procedure is backend-specific, but we expose a common syntax to achieve it.

operatorpreparation (different point)preparation (same point)
derivativeprepare_derivative-
gradientprepare_gradient-
jacobianprepare_jacobian-
second_derivativeprepare_second_derivative-
hessianprepare_hessian-
pushforwardprepare_pushforwardprepare_pushforward_same_point
pullbackprepare_pullbackprepare_pullback_same_point
hvpprepare_hvpprepare_hvp_same_point

In addition, the preparation syntax depends on the number of arguments accepted by the function.

function signaturepreparation signature
out-of-place functionprepare_op(f, backend, x, [t])
in-place functionprepare_op(f!, y, backend, x, [t])

Preparation creates an object called extras which contains the the necessary information to speed up an operator and its variants. The idea is that you prepare only once, which can be costly, but then call the operator several times while reusing the same extras.

op(f, backend, x, [t])  # slow because it includes preparation
op(f, extras, backend, x, [t])  # fast because it skips preparation
Warning

The extras object is always mutated, regardless of the bang ! in the operator name.

Reusing preparation

Deciding whether it is safe to reuse the results of preparation is not easy. Here are the general rules that we strive to implement:

different pointsame point
the output extras of...prepare_op(f, b, x)prepare_op_same_point(f, b, x, t)
can be used in...op(f, extras, b, other_x)op(f, extras, b, x, other_t)
provided that...other_x has same type and shape as xother_t has same type and shape as t

These rules hold for the majority of backends, but there are some exceptions: see this page to know more.

Second order

For second-order operators, there are two options: use a single backend or combine two of them within the SecondOrder struct.

Single backend

Some backends natively support a set of second-order operators (typically only the hessian). In that case, it can be advantageous to use the backend on its own. If the operator is not supported natively, we will fall back on SecondOrder(backend, backend) (see below).

Combining backends

In general, you can use SecondOrder to combine different backends.

backend = SecondOrder(outer_backend, inner_backend)

The inner backend will be called first, and the outer backend will differentiate the generated code.

There are many possible backend combinations, a lot of which will fail. Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.

Danger

SecondOrder backends do not support first-order operators.

Sparsity

When computing sparse Jacobians or Hessians (with a lot of zeros in the matrix), it is possible to take advantage of their sparsity pattern to speed things up. For this to work, three ingredients are needed (read this survey to understand why):

  1. An underlying (dense) backend
  2. A sparsity pattern detector like:
  3. A coloring algorithm: GreedyColoringAlgorithm from SparseMatrixColorings.jl is the only one we support.
Warning

Generic sparse AD is now located in a package extension which depends on SparseMatrixColorings.jl.

These ingredients can be combined within the AutoSparse wrapper, which DifferentiationInterface.jl re-exports. AutoSparse backends only support operators jacobian and hessian (as well as their variants). Note that for sparse Hessians, you need to put the SecondOrder backend inside AutoSparse, and not the other way around.

The preparation step of jacobian or hessian with an AutoSparse backend can be long, because it needs to detect the sparsity pattern and color the resulting sparse matrix. But after preparation, the more zeros are present in the matrix, the greater the speedup will be compared to dense differentiation.

Danger

The result of preparation for an AutoSparse backend cannot be reused if the sparsity pattern changes.

Info

Symbolic backends have built-in sparsity handling, so AutoSparse(AutoSymbolics()) and AutoSparse(AutoFastDifferentiation()) do not need additional configuration for pattern detection or coloring. However they still benefit from preparation.

Going further

Non-standard types

The package is thoroughly tested with inputs and outputs of the following types: Float64, Vector{Float64} and Matrix{Float64}. We also expect it to work on most kinds of Number and AbstractArray variables. Beyond that, you are in uncharted territory. We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs might work with some backends, but we make no guarantees about that.

Multiple inputs/outputs

Restricting the API to one input and one output has many coding advantages, but it is not very flexible. If you need more than that, try using ComponentArrays.jl to wrap several objects inside a single ComponentVector.