Basics

The goal of this tutorial is to give users already familiar with automatic differentiation (AD) an overview of the Enzyme differentiation API for the following differentiation modes

  • Reverse mode
  • Forward mode
  • Forward over reverse mode
  • Vector Forward over reverse mode

Defining a function

Enzyme differentiates arbitrary multivariate vector functions as the most general case in automatic differentiation

\[f: \mathbb{R}^n \rightarrow \mathbb{R}^m, y = f(x)\]

For simplicity we define a vector function with $m=1$. However, this tutorial can easily be applied to arbitrary $m \in \mathbb{N}$.

using Enzyme

function f(x::Array{Float64}, y::Array{Float64})
    y[1] = x[1] * x[1] + x[2] * x[1]
    return nothing
end;

Reverse mode

The reverse model in AD is defined as

\[\begin{aligned} y &= f(x) \\ \bar{x} &= \bar{y} \cdot \nabla f(x) \end{aligned}\]

bar denotes an adjoint variable. Note that executing an AD in reverse mode computes both $y$ and the adjoint $\bar{x}$.

x  = [2.0, 2.0]
bx = [0.0, 0.0]
y  = [0.0]
by = [1.0];

Enzyme stores the value and adjoint of a variable in an object of type Duplicated where the first element represent the value and the second the adjoint. Evaluating the reverse model using Enzyme is done via the following call.

Enzyme.autodiff(Reverse, f, Duplicated(x, bx), Duplicated(y, by));

This yields the gradient of f in bx at point x = [2.0, 2.0]. by is called the seed and has to be set to $1.0$ in order to compute the gradient. Let's save the gradient for later.

g = copy(bx)
2-element Vector{Float64}:
 6.0
 2.0

Forward mode

The forward model in AD is defined as

\[\begin{aligned} y &= f(x) \\ \dot{y} &= \nabla f(x) \cdot \dot{x} \end{aligned}\]

To obtain the first element of the gradient using the forward model we have to seed $\dot{x}$ with $\dot{x} = [1.0,0.0]$

x  = [2.0, 2.0]
dx = [1.0, 0.0]
y  = [0.0]
dy = [0.0];

In the forward mode the second element of Duplicated stores the tangent.

Enzyme.autodiff(Forward, f, Duplicated(x, dx), Duplicated(y, dy));

We can now verify that indeed the reverse mode and forward mode yield the same result for the first component of the gradient. Note that to acquire the full gradient one needs to execute the forward model a second time with the seed dx set to [0.0,1.0].

Let's verify whether the reverse and forward model agree.

g[1] == dy[1]
true

Forward over reverse

The forward over reverse (FoR) model is obtained by applying the forward model to the reverse model using the chain rule for the product in the adjoint statement.

\[\begin{aligned} y &= f(x) \\ \dot{y} &= \nabla f(x) \cdot \dot{x} \\ \bar{x} &= \bar{y} \cdot \nabla f(x) \\ \dot{\bar{x}} &= \bar{y} \cdot \nabla^2 f(x) \cdot \dot{x} + \dot{\bar{y}} \cdot \nabla f(x) \end{aligned}\]

To obtain the first column/row of the Hessian $\nabla^2 f(x)$ we have to seed $\dot{\bar{y}}$ with $[0.0]$, $\bar{y}$ with $[1.0]$ and $\dot{x}$ with $[1.0, 0.0]$.

y = [0.0]
x = [2.0, 2.0]

dy = [0.0]
dx = [1.0, 0.0]

bx = [0.0, 0.0]
by = [1.0]
dbx = [0.0, 0.0]
dby = [0.0]

Enzyme.autodiff(
    Forward,
    (x,y) -> Enzyme.autodiff(Reverse, f, x, y),
    Duplicated(Duplicated(x, bx), Duplicated(dx, dbx)),
    Duplicated(Duplicated(y, by), Duplicated(dy, dby)),
)
()

The FoR model also computes the forward model from before, giving us again the first component of the gradient.

g[1] == dy[1]
true

In addition we now have the first row/column of the Hessian.

dbx[1] == 2.0
dbx[2] == 1.0
true

Vector forward over reverse

The vector FoR allows us to propagate several tangents at once through the second-order model by computing the derivative of the gradient at multiple points at once. We begin by defining a helper function for the gradient. Since we will not need the original results (stored in y), we can mark it DuplicatedNoNeed. Specifically, this will perform the following:

\[\begin{aligned} \bar{x} &= \bar{x} + \bar{y} \cdot \nabla f(x) \\ \bar{y} &= 0 \end{aligned}\]

function grad(x, dx, y, dy)
  Enzyme.autodiff(Reverse, f, Duplicated(x, dx), DuplicatedNoNeed(y, dy))
  nothing
end
grad (generic function with 1 method)

To compute the conventional gradient, we would call this function with our given inputs, dy = [1.0], and dx = [0.0, 0.0]. Since y is not needed, we can just set it to an undef vector.

x = [2.0, 2.0]
y = Vector{Float64}(undef, 1)
dx = [0.0, 0.0]
dy = [1.0]

grad(x, dx, y, dy)

dx now contains the gradient

@show dx
2-element Vector{Float64}:
 6.0
 2.0

To compute the hessian, we need to take the dervative of this gradient function at every input. Following the same seeding strategy as before, we now seed both in the vx[1]=[1.0, 0.0] and vx[2]=[0.0, 1.0] direction. These tuples have to be put into a BatchDuplicated type. We then compute the forward mode derivative at all these points.

vx = ([1.0, 0.0], [0.0, 1.0])
hess = ([0.0, 0.0], [0.0, 0.0])
dx = [0.0, 0.0]
dy = [1.0]

Enzyme.autodiff(Enzyme.Forward, grad,
                Enzyme.BatchDuplicated(x, vx),
                Enzyme.BatchDuplicated(dx, hess),
                Const(y),
                Const(dy))
()

Again we obtain the first-order gradient. If we did not want to compute the gradient again, we could instead have used Enzyme.BatchDuplicatedNoNeed(dx, hess)

g[1] == dx[1]
true

We have now the first row/column of the Hessian

hess[1][1] == 2.0

hess[1][2] == 1.0
true

as well as the second row/column

hess[2][1] == 1.0

hess[2][2] == 0.0
true

This page was generated using Literate.jl.