EnzymeXLA Dialect
Reactant.MLIR.Dialects.enzymexla.gpu_wrapper Method
gpu_wrapper
The optional arguments to this operation are suggestions about what block dimensions this gpu kernel should have - usually taken from kernel launch params
sourceReactant.MLIR.Dialects.enzymexla.lapack_gemqrt Method
lapack_gemqrt
This operation is modeled after LAPACK's *GEMQR routines.
sourceReactant.MLIR.Dialects.enzymexla.lapack_geqrf Method
lapack_geqrf
This operation computes the QR factorization of a matrix using Householder reflections. Mathematically, it decomposes A into the product of an orthogonal matrix Q and an upper triangular matrix R, such that A = QR.
This operation is modeled after LAPACK's *GEQRF routines, which returns the result in the QR packed format.
sourceReactant.MLIR.Dialects.enzymexla.lapack_geqrt Method
lapack_geqrt
This operation computes the QR factorization of a matrix using Householder reflections. Mathematically, it decomposes A into the product of an orthogonal matrix Q and an upper triangular matrix R, such that A = QR.
This operation is modeled after LAPACK's *GEQRT routines, which returns the result in the QR CompactWY format.
sourceReactant.MLIR.Dialects.enzymexla.lapack_orgqr Method
lapack_orgqr
This operation is modeled after LAPACK's _ORGQR/_UNGQR routines.
sourceReactant.MLIR.Dialects.enzymexla.lapack_ormqr Method
lapack_ormqr
This operation is modeled after LAPACK's *ORMQR routines.
sourceReactant.MLIR.Dialects.enzymexla.linalg_qr Method
linalg_qr
This operation computes the QR factorization of a matrix using Householder reflections. Mathematically, it decomposes A into the product of an orthogonal (unitary if complex) matrix Q and an upper triangular matrix R, such that A = QR.
If A has size m x n and m > n, Q is an m x n isometric matrix. If m < n, R will be a m x n trapezoidal matrix.
This operation is modeled after the mathematical formulation of the QR factorization, and not after LAPACK's compact formats.
sourceReactant.MLIR.Dialects.enzymexla.memcpy Method
memcpy
The gpu.memcpy
operation copies the content of one memref to another.
The op does not execute before all async dependencies have finished executing.
If the async
keyword is present, the op is executed asynchronously (i.e. it does not block until the execution has finished on the device). In that case, it returns a !gpu.async.token.
Example
%token = gpu.memcpy async [%dep] %dst, %src : memref<?xf32, 1>, memref<?xf32>