Manifest & Boundary

The bundle metadata types parsed from manifest.yaml, the transport-agnostic request/response boundary types, and the canonical dtype enumeration. These live in ReactantServerCore (the shared substrate). See Bundles & model.jl for the manifest format.

Manifest

ReactantServerCore.Manifest — Type

Manifest

The parsed and validated manifest.yaml of a model bundle. It records the format_version, the bundle name and description, the executable input/output specs (executable_inputs/executable_outputs), the optional client-facing specs (client_inputs/client_outputs, present only when a model.jl transforms the I/O), the BatchingSpec, provenance metadata, and the derived 0-based input_batch_dim. Tensor specs are TensorSpec values; see TensorSpec and Dim for the einsum-style shape encoding.

ReactantServerCore.TensorSpec — Type

TensorSpec

One tensor in a Manifest: its name, DType, shape (a Vector{Dim}), and batch_axis (the 1-based index of the batch dim, or nothing when the tensor is not batched).

ReactantServerCore.Dim — Type

Dim

A single axis of a tensor shape. kind is one of FIXED, BATCH, or VARIABLE; size is meaningful only when kind == FIXED (it is 0 otherwise). A FIXED dim has a concrete size, a BATCH dim is the batch axis (from the reserved n/b shape letters), and a VARIABLE dim (a -1 in the manifest dims map) is a dynamic non-batch axis.

ReactantServerCore.BatchingSpec — Type

BatchingSpec

The set of batch sizes a bundle was compiled for (compiled_batch_sizes). At inference the request's size along the batch axis must equal one of these; the scheduler coalesces requests up to a compiled size and selects the matching executable.

ReactantServerCore.load_manifest — Function

load_manifest(path) -> Manifest

Parse a manifest YAML file at path into a Manifest. This runs the structural parsing and validation of parse_manifest but not the bundle-directory checks in validate_manifest, so it is usable wherever only the manifest's contents are needed, for example a client deriving a model's I/O spec offline.

ReactantServerCore.is_meta — Function

is_meta(m::Manifest) -> Bool

True when the bundle is a meta model (a Julia orchestration over other models) rather than a regular StableHLO executable.

Boundary

ReactantServerCore.NamedTensor — Type

NamedTensor(name, dtype, shape, data)
NamedTensor(name, data)

A named host tensor carried across the transport boundary as both an input and an output. It pairs a tensor name with its DType, its shape (Julia column-major dimensions), and the backing data array. The two-argument form derives dtype and shape from a typed host Array.

ReactantServerCore.InferRequest — Type

InferRequest

A decoded inference request, the scheduler's unit of work. It names the target model (model_name), the requested_outputs the caller wants returned, and the input tensors (inputs, a Vector{NamedTensor}). deadline_ns is an absolute local time_ns() deadline (0 means none): a remaining-budget timeout carried over the wire is converted to this local absolute form at decode, so cross-process monotonic-clock differences never matter. The codec produces it from a wire ModelInferRequest; the scheduler and runtime consume only this transport-agnostic form.

ReactantServerCore.DeadlineExceeded — Type

DeadlineExceeded(model_name)

Raised when a request's deadline has already passed at dispatch admission: the scheduler refuses to begin GPU work that is already expired, and a meta orchestration refuses to issue a further sub-call once its budget is gone. It never interrupts a running PJRT/GPU call; it only declines to start new work. The gRPC layer maps it to DEADLINE_EXCEEDED.

Datatypes

ReactantServerCore.DType — Type

DType

Canonical element-type enumeration shared across the server, the single source of truth for dtype translation. The companion maps convert between three representations: the manifest token form (e.g. "f32", "bf16"), the Julia element type (e.g. Float32, BFloat16), and the KServe V2 wire datatype string (e.g. "FP32", "BF16").

The DType to XLA primitive-type mapping deliberately lives in the Reactant backend, not here, so this layer carries no Reactant dependency.

FP8 (F8E5M2, F8E4M3) has no standard KServe wire datatype, so those two variants are intentionally absent from the wire mapping and may appear only on executable-internal tensors, never on client-facing inputs or outputs. Conversions are performed by dtype_from_token, dtype_token, julia_type, dtype_of, dtype_size, kserve_string, and dtype_from_kserve.