Operators

DifferentiationInterface.jl is based on two concepts: operators and backends. This page is about the former, check out that page to learn about the latter.

List of operators

Given a function f(x) = y, there are several differentiation operators available. The terminology depends on:

the type and shape of the input x
the type and shape of the output y
the order of differentiation

Below we list and describe all the operators we support.

Tip

Read the book The Elements of Differentiable Programming for details on these concepts.

High-level operators

These operators are computed using only the input x.

operator	order	input `x`	output `y`	operator result type	operator result shape
`derivative`	1	`Number`	`Number` or `AbstractArray`	same as `y`	`size(y)`
`second_derivative`	2	`Number`	`Number` or `AbstractArray`	same as `y`	`size(y)`
`gradient`	1	`AbstractArray`	`Number`	same as `x`	`size(x)`
`jacobian`	1	`AbstractArray`	`AbstractArray`	`AbstractMatrix`	`(length(y), length(x))`
`hessian`	2	`AbstractArray`	`Number`	`AbstractMatrix`	`(length(x), length(x))`

Low-level operators

These operators are computed using the input x and a tangent t of type Tangents. This tangent is essentially an NTuple, whose elements live either

in the same space as x (we call it tx)
or in the same space as y (we call it ty)

operator	order	input `x`	output `y`	tangent `t`	operator result type	operator result shape
`pushforward` (JVP)	1	`Any`	`Any`	`tx`	same as `y`	`size(y)`
`pullback` (VJP)	1	`Any`	`Any`	`ty`	same as `x`	`size(x)`
`hvp`	2	`AbstractArray`	`Number`	`tx`	same as `x`	`size(x)`

Variants

Several variants of each operator are defined:

out-of-place operators return a new derivative object
in-place operators mutate the provided derivative object

out-of-place	in-place	out-of-place + primal	in-place + primal
`derivative`	`derivative!`	`value_and_derivative`	`value_and_derivative!`
`second_derivative`	`second_derivative!`	`value_derivative_and_second_derivative`	`value_derivative_and_second_derivative!`
`gradient`	`gradient!`	`value_and_gradient`	`value_and_gradient!`
`hessian`	`hessian!`	`value_gradient_and_hessian`	`value_gradient_and_hessian!`
`jacobian`	`jacobian!`	`value_and_jacobian`	`value_and_jacobian!`
`pushforward`	`pushforward!`	`value_and_pushforward`	`value_and_pushforward!`
`pullback`	`pullback!`	`value_and_pullback`	`value_and_pullback!`
`hvp`	`hvp!`	NA	NA

function signature	out-of-place operator	in-place operator
out-of-place function	`op(f, backend, x, [t])`	`op!(f, result, backend, x, [t])`
in-place function	`op(f!, y, backend, x, [t])`	`op!(f!, y, result, backend, x, [t])`

The positional arguments between f/f! and backend are always mutated. This convention holds regardless of the bang ! in the operator name. In particular, for in-place functions f!(y, x), every variant of every operator will mutate y.

Preparation

Principle

In many cases, AD can be accelerated if the function has been called at least once (e.g. to record a tape) or if some cache objects are provided. This preparation procedure is backend-specific, but we expose a common syntax to achieve it.

operator	preparation (different point)	preparation (same point)
`derivative`	`prepare_derivative`	-
`gradient`	`prepare_gradient`	-
`jacobian`	`prepare_jacobian`	-
`second_derivative`	`prepare_second_derivative`	-
`hessian`	`prepare_hessian`	-
`pushforward`	`prepare_pushforward`	`prepare_pushforward_same_point`
`pullback`	`prepare_pullback`	`prepare_pullback_same_point`
`hvp`	`prepare_hvp`	`prepare_hvp_same_point`

In addition, the preparation syntax depends on the number of arguments accepted by the function.

function signature	preparation signature
out-of-place function	`prepare_op(f, backend, x, [t])`
in-place function	`prepare_op(f!, y, backend, x, [t])`

Preparation creates an object called extras which contains the the necessary information to speed up an operator and its variants. The idea is that you prepare only once, which can be costly, but then call the operator several times while reusing the same extras.

op(f, backend, x, [t])  # slow because it includes preparation
op(f, extras, backend, x, [t])  # fast because it skips preparation

Warning

The extras object is always mutated, regardless of the bang ! in the operator name.

Reusing preparation

Deciding whether it is safe to reuse the results of preparation is not easy. Here are the general rules that we strive to implement:

	different point	same point
the output `extras` of...	`prepare_op(f, b, x)`	`prepare_op_same_point(f, b, x, t)`
can be used in...	`op(f, extras, b, other_x)`	`op(f, extras, b, x, other_t)`
provided that...	`other_x` has same type and shape as `x`	`other_t` has same type and shape as `t`

These rules hold for the majority of backends, but there are some exceptions: see this page to know more.

Second order

For second-order operators, there are two options: use a single backend or combine two of them within the SecondOrder struct.

Single backend

Some backends natively support a set of second-order operators (typically only the hessian). In that case, it can be advantageous to use the backend on its own. If the operator is not supported natively, we will fall back on SecondOrder(backend, backend) (see below).

Combining backends

In general, you can use SecondOrder to combine different backends.

backend = SecondOrder(outer_backend, inner_backend)

The inner backend will be called first, and the outer backend will differentiate the generated code.

There are many possible backend combinations, a lot of which will fail. Usually, the most efficient approach for Hessians is forward-over-reverse, i.e. a forward-mode outer backend and a reverse-mode inner backend.

Danger

SecondOrder backends do not support first-order operators.

Sparsity

When computing sparse Jacobians or Hessians (with a lot of zeros in the matrix), it is possible to take advantage of their sparsity pattern to speed things up. For this to work, three ingredients are needed (read this survey to understand why):

An underlying (dense) backend
A sparsity pattern detector like:
- TracerSparsityDetector from SparseConnectivityTracer.jl
- SymbolicsSparsityDetector from Symbolics.jl
- DenseSparsityDetector from DifferentiationInterface.jl (beware that this detector only gives a locally valid pattern)
A coloring algorithm: GreedyColoringAlgorithm from SparseMatrixColorings.jl is the only one we support.

Warning

Generic sparse AD is now located in a package extension which depends on SparseMatrixColorings.jl.

These ingredients can be combined within the AutoSparse wrapper, which DifferentiationInterface.jl re-exports. AutoSparse backends only support operators jacobian and hessian (as well as their variants). Note that for sparse Hessians, you need to put the SecondOrder backend inside AutoSparse, and not the other way around.

The preparation step of jacobian or hessian with an AutoSparse backend can be long, because it needs to detect the sparsity pattern and color the resulting sparse matrix. But after preparation, the more zeros are present in the matrix, the greater the speedup will be compared to dense differentiation.

Danger

The result of preparation for an AutoSparse backend cannot be reused if the sparsity pattern changes.

Info

Symbolic backends have built-in sparsity handling, so AutoSparse(AutoSymbolics()) and AutoSparse(AutoFastDifferentiation()) do not need additional configuration for pattern detection or coloring. However they still benefit from preparation.

Going further

Non-standard types

The package is thoroughly tested with inputs and outputs of the following types: Float64, Vector{Float64} and Matrix{Float64}. We also expect it to work on most kinds of Number and AbstractArray variables. Beyond that, you are in uncharted territory. We voluntarily keep the type annotations minimal, so that passing more complex objects or custom structs might work with some backends, but we make no guarantees about that.

Multiple inputs/outputs

Restricting the API to one input and one output has many coding advantages, but it is not very flexible. If you need more than that, try using ComponentArrays.jl to wrap several objects inside a single ComponentVector.