Transport

The wire layer beneath the boundary types: the KServe V2 codec that translates protobuf messages to and from InferRequest / NamedTensor, and the system shared-memory registry that backs the Triton-compatible zero-copy data plane. These live in ReactantServerCore (shared by the worker and the gateway) and are documented here for contributors.

Codec

ReactantServerCore.TIMEOUT_NS_PARAM — Constant

TIMEOUT_NS_PARAM

Key of the request-level KV parameter carrying the caller's REMAINING budget in nanoseconds (relative, not an absolute timestamp). Like the shared-memory region parameters, this is an extension to KServe V2 passed through ModelInferRequest.parameters. It is relative so each hop converts it to its own local absolute deadline (time_ns() + budget), which makes it robust to cross-process monotonic-clock differences and lets it ride unchanged through the gateway's raw-byte request forwarding. See deadline_params.

ReactantServerCore.deadline_params — Method

deadline_params(budget_ns) -> Dict{String,InferParameter}

Build the request-level parameters map carrying a remaining-budget timeout of budget_ns nanoseconds (see TIMEOUT_NS_PARAM). A non-positive budget_ns yields an empty map (no deadline). Merge the result into a ModelInferRequest's parameters.

ReactantServerCore.decode_infer_request — Function

decode_infer_request(msg, registry=nothing) -> DecodedRequest

Translate a decoded ModelInferRequest message into the boundary InferRequest. The transport (gRPC) hands us the already-decoded protobuf message, so the codec never touches wire bytes. Input tensor data is read from a registered shared-memory region (preferred when the tensor declares one), otherwise from rawinputcontents, otherwise from the typed contents field.

ReactantServerCore.decode_infer_response — Method

decode_infer_response(msg) -> Vector{NamedTensor}

Translate a ModelInferResponse into boundary NamedTensor outputs. Data is read from rawoutputcontents when present, otherwise from the typed contents field. Shared-memory-backed outputs are not supported on this path (the caller never requests them).

ReactantServerCore.encode_infer_request — Method

encode_infer_request(model_name, inputs; requested_outputs=String[], id="") -> ModelInferRequest

Build a ModelInferRequest from boundary NamedTensor inputs, with tensor data inline in rawinputcontents. requested_outputs, when non-empty, names the outputs to return.

ReactantServerCore.encode_infer_request_shm — Method

encode_infer_request_shm(model_name, inputs, region, offsets; requested_outputs, id)

Encode a request whose inputs are ALL staged in shared-memory region at the given byte offsets (parallel to inputs); no raw_input_contents — the receiver reads each tensor via shm_read. The receiver must have region registered. This is all-or-nothing per request: the decode path treats raw_input_contents as parallel-to-inputs, so a request never mixes raw and SHM inputs.

ReactantServerCore.encode_infer_response — Method

encode_infer_response(model_name, id, outputs) -> ModelInferResponse

Build the response message with outputs entirely inline (rawoutputcontents). The transport serializes the returned message.

ReactantServerCore.encode_infer_response — Method

encode_infer_response(model_name, decoded, outputs, registry) -> ModelInferResponse

Build the response message, writing any output whose requested entry named a shared-memory region into that region (and referencing it in the response) instead of inline. rawoutputcontents holds the inline outputs in order.

ReactantServerCore.encode_model_metadata — Method

encode_model_metadata(name, manifest, platform) -> ModelMetadataResponse

Build a ModelMetadataResponse message from the manifest's client-facing I/O spec.

ReactantServerCore.encode_repository_index — Method

encode_repository_index(names) -> RepositoryIndexResponse
encode_repository_index(entries::AbstractVector{<:Pair}) -> RepositoryIndexResponse

Build a RepositoryIndexResponse. The first form lists every model as READY (direct-client introspection). The second takes name => ready::Bool pairs and reports READY or UNAVAILABLE per model, so the gateway can discover which replicas actually serve a model (readiness reflects residency on the worker).

Shared memory

ReactantServerCore.same_ipc_namespace — Method

same_ipc_namespace(name) -> Bool

Return whether the POSIX shared-memory object name is visible in this process's IPC namespace. The client passes the name of an object it created; we answer true only if we can open that same object ourselves, which is what determines whether system shared-memory transport can work between the two processes. The open is read-only and detached immediately; nothing is registered or kept mapped. Any failure (object absent because we are in a different namespace, permission error, malformed name) is reported as false.

ReactantServerCore.shm_read — Method

shm_read(registry, name, offset, byte_size) -> Vector{UInt8}

Copy [offset, offset+byte_size) of the named region into a fresh byte vector. The copy is done while the region's lock is held, so a concurrent shm_unregister! of the same region cannot detach the mapping underneath the read.

ReactantServerCore.shm_register! — Method

shm_register!(registry, name, key, offset, byte_size)

Attach the existing POSIX shared-memory object key read-write and register it under name. Re-registering a name replaces (and detaches) the previous mapping.

ReactantServerCore.shm_unregister! — Method

shm_unregister!(registry, name)

Unregister and detach a region, or all regions when name is empty. Idempotent: unregistering a name that is not registered is a successful no-op (it matches KServe semantics and lets the gateway fan-out and the client's pre-emptive cleanup unregister succeed without a spurious error). Registration, by contrast, fails loudly (see shm_register!) so a bad region surfaces at register time rather than during inference.

ReactantServerCore.shm_write! — Method

shm_write!(registry, name, offset, bytes)

Copy bytes into [offset, offset+length(bytes)) of the named region. The copy is done while the region's lock is held, so a concurrent shm_unregister! of the same region cannot detach the mapping underneath the write.

Staging buffer pool

The concurrency-safe staging pool the client drives inference through (and the precompile workloads exercise). A fixed-slot allocator over one contiguous backing region; a request may span several physically contiguous slots.

ReactantServerCore.BufferPool — Method

BufferPool(n_bytes; n_slots=8, use_shm=true, name="reactant_server_pool")

Allocate a staging pool of n_bytes divided into n_slots equal slots. slot_bytes is fixed at construction (n_bytes ÷ n_slots), not recomputed per request, so the allocator can hand disjoint slots to concurrent callers. A SHM-backed pool can be registered with a server; an inline pool (use_shm=false) is the fallback transport.

ReactantServerCore.PoolAcquireTimeout — Type

PoolAcquireTimeout(span, waited_ns)

Raised by acquire_slot! when a deadline_ns was supplied and passed before span contiguous slots became free. The waiter is dequeued before this is thrown, so it never stalls the line. Callers that carry a request deadline (e.g. a meta model's fan-out) translate this into their own deadline-exceeded error.

ReactantServerCore.acquire_slot! — Function

acquire_slot!(pool, span=1; deadline_ns=0) -> PoolSlot

Block until span physically contiguous slots are free, then return them as one slot whose [offset, offset + span*slot_bytes) range no other in-flight slot overlaps. Throws an ArgumentError immediately if span exceeds the pool's total slot count, since such a request could never be satisfied and would otherwise deadlock. Waiters are served in FIFO order. Pair with release_slot!, which frees the whole run.

deadline_ns is an absolute time_ns() deadline (0 = wait indefinitely, the default and prior behavior). When set, a waiter that has not acquired by the deadline throws [PoolAcquireTimeout] instead of parking past it. A one-shot timer wakes the waiter at the deadline even if no slot is released in the meantime, so a starved waiter fails fast rather than burning a request's whole budget in the park.

ReactantServerCore.release_slot! — Method

release_slot!(slot)

Return a slot acquired with acquire_slot! to the pool's free set, freeing every physical slot in its span. Releasing a derived subslot (index == 0) is a no-op. Throws if any slot in the span is already free (double release).

ReactantServerCore.scratch — Method

scratch(slot, dims, T) -> Array{T}
scratch(slot, [dims1 => T1, dims2 => T2, ...]) -> Vector{Array}

Carve one typed buffer per dims => T spec from slot in a single call, advancing the slot's cursor so the buffers occupy disjoint, contiguous byte ranges. Each buffer is an Array{T} aliasing the pool's backing (via pool_view; zero-copy, and uniform across SHM- and Memory-backed pools since it is just pool_base + offset); write into the returned arrays directly. dims is a shape tuple (or a bare integer for a vector).

This is the buffer-request interface shared by the meta-model call.scratch and the client driver: ask for ALL buffers up front in one call. The carved buffers' lifetime is bounded by slot (and by the pool, which the caller keeps alive); they become invalid once it is released.