History and ecosystem

Enzyme started as a project created by William Moses and Valentin Churavy to differentiate the LLVM-IR, including languages with an LLVM frontends like C, Julia, Swift, Fortran, etc. Operating within the compiler enables Enzyme to interoperate with optimizations, allowing for higher performance than conventional methods while simultaneously not needing special handling for each language and construct. Enzyme is an LLVM Incubator projects and intends to ask for upstreaming later in 2024.

In 2020, initial investigations on using Enzyme on Rust was led by Tiberius Ferreria and William Moses through the use of foreign function calls (https://internals.rust-lang.org/t/automatic-differentiation-differential-programming-via-llvm/13188/7).

In 2021, Manuel Drehwald and Lorenz Schmidt worked on Oxide-Enzyme which aimed to directly integrate Enzyme as a compiler-aware cargo plugin.

The current Rust-Enzyme project direct embeds Enzyme into rust and makes available autodiff macros for easy usage. The project is led by Manuel Drehwald, in collaboration with Jed Brown, William Moses, Lorenz Schmidt, Ningning Xie, and Rodrigo Vargas-Hernandez.

Development of a Rust-Enzyme frontend

We hope that as part of the nightly releases Rust-Enzyme can mature relatively fast because:

  1. Unlike Julia, Rust does not emit code involving Garbage Collection, JIT, or Type Unstable code -- simplifying the inputs to Enzyme (and reducing the need to develop support for such mechanisms, which have since been added to Enzyme.jl).
  2. Unlike Clang, we do ship the source code for the standard library. On the Rust side, we therefore don't need to manually add support for functions libstdc++ like std::map decrement.
  3. Minimizing Rust code is reasonably nice and Cargo/crates.io makes it easy to reproduce bugs.

Non-alternatives

The key aspect for the performance of our solution is that AD is performed after compiler optimizations have been applied (and is able to run additional optimizations). This observation is mostly language independent and motivated in the 2020 Enzyme Neurips paper, and also mentioned towards the end of this non-Enzyme java autodiff case-study.

Wrapping cargo instead of modifying rustc

We can use Enzyme without modifying rustc, as demonstrated in oxide-enzyme.

  1. We let users specify a list of functions which they want to differentiate and how (forward/reverse, activities...). example.
  2. We manually emit the optimized llmv-ir of our rust programm and all dependencies.
  3. We llvm-link all files into a single module (equivalent to fat-lto).
  4. We call Enzyme to differentiate functions.
  5. We adjust linker visibility of the new functions and create an archive that exports those new functions.
  6. We termintate this cargo invocation (can e.g. be achieved by -Zno-link).
  7. We call cargo a second time, this time providing our archive as additional linker argument. The functions provided by the archive exactly match the extern fn declarations created through our macro here.

This PoC required the use of build-std, to be able to see the llvm-ir of functions from the std lib.
An alternative would have been to provide rules for Enzyme on how to differentiate every function from the Rust std, which seems undesirable. It would however not be impossible, C++-Enzyme has various rules for the C++ std lib.

This approach also assumes that linking llvm-ir generated by two different cargo invocations and passing Rust objects between those works fine.

This approach is further limited in compile times and reliability. See the example at the bottom left of this poster. LLVM types are often too limited to determine the correct derivative (e.g. opaque ptr), and as such Enzyme has to run a usage analysis to determine the relevant type of a variable. This can be time consuming (we encountered multiple cases with > 1000x longer compile times) and it can be unreliable, if Enzyme fails to deduce the correct type of a variable due to insufficient usages. When calling Enzyme from within rustc, we are able to provide high-level type information to Enzyme. For oxide-enzyme, we tried to mitigate this by using a Dwarf debug parser (requirering debug information even in release builds), but even with this helpers we were completely unable to support Enums due to their ability of representing different types. This approach was also limited since rustc (at the time we wrote it) did not emit Dwarf information for all Rust types with unstable layout.

Rust level autodiff

Various Rust libraries for the training of Neural Networks exist (burn/candle/dfdx/rai/autograph). We talked with developers from burn, rai, and autograph to compare the autodiff performance under the Microsoft ADBench Benchmark suite. After some investigation all three decided that supporting such cases would require significant redesigns of their projects, which they can't afford in the forseeable future.
When training Neural Networks, we often look at few large variables (tensors) and a small set of functions (layers) which dominate the runtime. Using these properties it's possible to amortize some inefficiencies by getting the most expensive operations efficient. Such optimizations stop working, once we look at the larger set of applications for scientific computing or HPC.