Enzyme
Documentation for Enzyme.jl, the Julia bindings for Enzyme.
Enzyme performs automatic differentiation (AD) of statically analyzable LLVM. It is highly-efficient and its ability to perform AD on optimized code allows Enzyme to meet or exceed the performance of state-of-the-art AD tools.
Enzyme.jl can be installed in the usual way Julia packages are installed:
] add Enzyme
The Enzyme binary dependencies will be installed automatically via Julia's binary artifact system.
The Enzyme.jl API revolves around the function autodiff
. For some common operations, Enzyme additionally wraps autodiff
in several convenience functions; e.g., gradient
and jacobian
.
The tutorial below covers the basic usage of these functions. For a complete overview of Enzyme's functionality, see the API documentation. Also see Implementing pullbacks on how to implement back-propagation for functions with non-scalar results.
Getting started
julia> rosenbrock(x, y) = (1.0 - x)^2 + 100.0 * (y - x^2)^2
rosenbrock (generic function with 1 method)
julia> rosenbrock_inp(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
rosenbrock_inp (generic function with 1 method)
Reverse mode
The return value of reverse mode autodiff
is a tuple that contains as a first value the derivative value of the active inputs and optionally the primal return value.
julia> autodiff(Reverse, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0),)
julia> autodiff(ReverseWithPrimal, rosenbrock, Active, Active(1.0), Active(2.0))
((-400.0, 200.0), 100.0)
julia> x = [1.0, 2.0]
2-element Vector{Float64}:
1.0
2.0
julia> dx = [0.0, 0.0]
2-element Vector{Float64}:
0.0
0.0
julia> autodiff(Reverse, rosenbrock_inp, Active, Duplicated(x, dx))
((nothing,),)
julia> dx
2-element Vector{Float64}:
-400.0
200.0
Both the inplace and "normal" variant return the gradient. The difference is that with Active
the gradient is returned and with Duplicated
the gradient is accumulated in place.
Forward mode
The return value of forward mode with a Duplicated
return is a tuple containing as the first value the primal return value and as the second value the derivative.
In forward mode Duplicated(x, 0.0)
is equivalent to Const(x)
, except that we can perform more optimizations for Const
.
julia> autodiff(Forward, rosenbrock, Duplicated, Const(1.0), Duplicated(3.0, 1.0))
(400.0, 400.0)
julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Const(3.0))
(400.0, -800.0)
Of note, when we seed both arguments at once the tangent return is the sum of both.
julia> autodiff(Forward, rosenbrock, Duplicated, Duplicated(1.0, 1.0), Duplicated(3.0, 1.0))
(400.0, -400.0)
We can also use forward mode with our inplace method.
julia> x = [1.0, 3.0]
2-element Vector{Float64}:
1.0
3.0
julia> dx = [1.0, 1.0]
2-element Vector{Float64}:
1.0
1.0
julia> autodiff(Forward, rosenbrock_inp, Duplicated, Duplicated(x, dx))
(400.0, -400.0)
Note the seeding through dx
.
Vector forward mode
We can also use vector mode to calculate both derivatives at once.
julia> autodiff(Forward, rosenbrock, BatchDuplicated, BatchDuplicated(1.0, (1.0, 0.0)), BatchDuplicated(3.0, (0.0, 1.0)))
(400.0, (var"1" = -800.0, var"2" = 400.0))
julia> x = [1.0, 3.0]
2-element Vector{Float64}:
1.0
3.0
julia> dx_1 = [1.0, 0.0]; dx_2 = [0.0, 1.0];
julia> autodiff(Forward, rosenbrock_inp, BatchDuplicated, BatchDuplicated(x, (dx_1, dx_2)))
(400.0, (var"1" = -800.0, var"2" = 400.0))
Convenience functions
While the convenience functions discussed below use autodiff
internally, they are generally more limited in their functionality. Beyond that, these convenience functions may also come with performance penalties; especially if one makes a closure of a multi-argument function instead of calling the appropriate multi-argument autodiff
function directly.
Key convenience functions for common derivative computations are gradient
(and its inplace variant gradient!
) and jacobian
. Like autodiff
, the mode (forward or reverse) is determined by the first argument.
The functions gradient
and gradient!
compute the gradient of function with vector input and scalar return.
julia> gradient(Reverse, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
-400.0
200.0
julia> # inplace variant
dx = [0.0, 0.0];
gradient!(Reverse, dx, rosenbrock_inp, [1.0, 2.0])
2-element Vector{Float64}:
-400.0
200.0
julia> dx
2-element Vector{Float64}:
-400.0
200.0
julia> gradient(Forward, rosenbrock_inp, [1.0, 2.0])
(-400.0, 200.0)
julia> # in forward mode, we can also optionally pass a chunk size
# to specify the number of derivatives computed simulateneously
# using vector forward mode
chunk_size = Val(2)
gradient(Forward, rosenbrock_inp, [1.0, 2.0], chunk_size)
(-400.0, 200.0)
The function jacobian
computes the Jacobian of a function vector input and vector return.
julia> foo(x) = [rosenbrock_inp(x), prod(x)];
julia> output_size = Val(2) # here we have to provide the output size of `foo` since it cannot be statically inferred
jacobian(Reverse, foo, [1.0, 2.0], output_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> chunk_size = Val(2) # By specifying the optional chunk size argument, we can use vector inverse mode to propogate derivatives of multiple outputs at once.
jacobian(Reverse, foo, [1.0, 2.0], output_size, chunk_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> jacobian(Forward, foo, [1.0, 2.0])
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
julia> # Again, the optinal chunk size argument allows us to use vector forward mode
jacobian(Forward, foo, [1.0, 2.0], chunk_size)
2×2 Matrix{Float64}:
-400.0 200.0
2.0 1.0
Caveats / Known-issues
Activity of temporary storage / Activity Unstable Code
If you pass in any temporary storage which may be involved in an active computation to a function you want to differentiate, you must also pass in a duplicated temporary storage for use in computing the derivatives. For example, consider the following function which uses a temporary buffer to compute the result.
function f(x, tmp, k, n)
tmp[1] = 1.0
for i in 1:n
tmp[k] *= x
end
tmp[1]
end
# output
f (generic function with 1 method)
Marking the argument for tmp
as Const (aka non-differentiable) means that Enzyme believes that all variables loaded from or stored into tmp
must also be non-differentiable, since all values inside a non-differentiable variable must also by definition be non-differentiable.
Enzyme.autodiff(Reverse, f, Active(1.2), Const(Vector{Float64}(undef, 1)), Const(1), Const(5)) # Incorrect
# output
((0.0, nothing, nothing, nothing),)
Passing in a dupliacted (e.g. differentiable) variable for tmp
now leads to the correct answer.
Enzyme.autodiff(Reverse, f, Active(1.2), Duplicated(Vector{Float64}(undef, 1), Vector{Float64}(undef, 1)), Const(1), Const(5)) # Correct (returns 10.367999999999999 == 1.2^4 * 5)
# output
((10.367999999999999, nothing, nothing, nothing),)
However, even if we ignore the semantic guarantee provided by marking tmp
as constant, another issue arises. When computing the original function, intermediate computations (like in f
above) can use tmp
for temporary storage. When computing the derivative, Enzyme also needs additional temporary storage space for the corresponding derivative variables as well. If tmp
is marked as Const, Enzyme does not have any temporary storage space for the derivatives!
Recent versions of Enzyme will attempt to error when they detect these latter types of situations, which we will refer to as activity unstable
. This term is chosen to mirror the Julia notion of type-unstable code (e.g. where a type is not known at compile time). If an expression is activity unstable, it could either be constant, or active, depending on data not known at compile time. For example, consider the following:
function g(cond, active_var, constant_var)
if cond
return active_var
else
return constant_var
end
Enzyme.autodiff(Forward, g, Const(condition), Duplicated(x, dx), Const(y))
The returned value here could either by constant or duplicated, depending on the runtime-defined value of cond
. If cond
is true, Enzyme simply returns the shadow of active_var
as the derivative. However, if cond
is false, there is no derivative shadow for constant_var
and Enzyme will throw a "Mismatched activity" error. For some simple types, e.g. a float Enzyme can circumvent this issue, for example by returning the float 0. Similarly, for some types like the Symbol type, which are never differentiable, such a shadow value will never be used, and Enzyme can return the original "primal" value as its derivative. However, for arbitrary data structures, Enzyme presently has no generic mechanism to resolve this.
For example consider a third function:
function h(cond, active_var, constant_var)
return [g(cond, active_var, constant_var), g(cond, active_var, constant_var)]
end
Enzyme.autodiff(Forward, h, Const(condition), Duplicated(x, dx), Const(y))
Enzyme provides a nice utility Enzyme.make_zero
which takes a data structure and constructs a deepcopy of the data structure with all of the floats set to zero and non-differentiable types like Symbols set to their primal value. If Enzyme gets into such a "Mismatched activity" situation where it needs to return a differentiable data structure from a constant variable, it could try to resolve this situation by constructing a new shadow data structure, such as with Enzyme.make_zero
. However, this still can lead to incorrect results. In the case of h
above, suppose that active_var
and consant_var
are both arrays, which are mutable (aka in-place) data types. This means that the return of h
is going to either be result = [active_var, active_var]
or result = [constant_var, constant_var]
. Thus an update to result[1][1]
would also change result[2][1]
since result[1]
and result[2]
are the same array.
If one created a new zero'd copy of each return from g
, this would mean that the derivative dresult
would have one copy made for the first element, and a second copy made for the second element. This could lead to incorrect results, and is unfortunately not a general resolution. However, for non-mutable variables (e.g. like floats) or non-differrentiable types (e.g. like Symbols) this problem can never arise.
Instead, Enzyme has a special mode known as "Runtime Activity" which can handle these types of situations. It can come with a minor performance reduction, and is therefore off by default. It can be enabled with Enzyme.API.runtimeActivity!(true)
right after importing Enzyme for the first time.
The way Enzyme's runtime activity resolves this issue is to return the original primal variable as the derivative whenever it needs to denote the fact that a variable is a constant. As this issue can only arise with mutable variables, they must be represented in memory via a pointer. All addtional loads and stores will now be modified to first check if the primal pointer is the same as the shadow pointer, and if so, treat it as a constant. Note that this check is not saying that the same arrays contain the same values, but rather the same backing memory represents both the primal and the shadow (e.g. a === b
or equivalently pointer(a) == pointer(b)
).
Enabling runtime activity does therefore, come with a sharp edge, which is that if the computed derivative of a function is mutable, one must also check to see if the primal and shadow represent the same pointer, and if so the true derivative of the function is actually zero.
Generally, the preferred solution to these type of activity unstable codes should be to make your variables all activity-stable (e.g. always containing differentiable memory or always containing non-differentiable memory). However, with care, Enzyme does support "Runtime Activity" as a way to differentiate these programs without having to modify your code.
CUDA.jl support
CUDA.jl is only supported on Julia v1.7.0 and onwards. On v1.6, attempting to differentiate CUDA kernel functions will not use device overloads correctly and thus returns fundamentally wrong results.
Sparse Arrays
At the moment there is limited support for sparse linear algebra operations. Sparse arrays may be used, but care must be taken because backing arrays drop zeros in Julia (unless told not to).
using SparseArrays
a = sparse([2.0])
da1 = sparse([0.0]) # Incorrect: SparseMatrixCSC drops explicit zeros
Enzyme.autodiff(Reverse, sum, Active, Duplicated(a, da1))
da1
# output
1-element SparseVector{Float64, Int64} with 0 stored entries
da2 = sparsevec([1], [0.0]) # Correct: Prevent SparseMatrixCSC from dropping zeros
Enzyme.autodiff(Reverse, sum, Active, Duplicated(a, da2))
da2
# output
1-element SparseVector{Float64, Int64} with 1 stored entry:
[1] = 1.0
Sometimes, determining how to perform this zeroing can be complicated. That is why Enzyme provides a helper function Enzyme.make_zero
that does this automatically.