
NVIDIA NeMo Relay
What Is NeMo Relay?
NVIDIA NeMo Relay is a portable execution runtime for agent systems that already have a
framework, model provider, policy layer, or observability backend. It gives those
systems one consistent way to describe, control, and observe what happens when an
agent crosses a request, tool, or LLM boundary.
Agent applications rarely live inside one clean abstraction. A production stack
might combine NeMo Agent Toolkit, LangChain, LangGraph, provider SDKs, custom
harness code, NeMo Guardrails, tracing systems, and evaluation pipelines. NeMo
Relay sits underneath those choices as the shared runtime contract for scopes,
middleware, plugins, lifecycle events, adaptive behavior, and observability.
Built as a Rust core with primary Rust, Python, and Node.js bindings, NeMo Relay
lets applications keep their orchestration model while runtime behavior stays
consistent across frameworks and languages.
Why Use It?
- 🧭 Own execution context across the whole agent run: Hierarchical scopes
attach tools, LLM calls, middleware, subscribers, and events to the same
parent-child execution tree.
- 🛡️ Package policy once: Guardrails and intercepts can block work, sanitize
observability payloads, transform requests, or wrap execution without
rewriting every call site.
- 📡 Emit one lifecycle stream: Subscribers consume canonical runtime events
in-process or export them as ATIF v1.7
trajectories, OpenTelemetry traces, or OpenInference-compatible traces.
- 🧩 Integrate without a framework migration: NeMo Relay can sit below NeMo
ecosystem components, third-party agent frameworks, provider adapters, or
direct application code.
- ⚙️ Install reusable runtime behavior: Plugins configure middleware,
subscribers, adaptive components, observability exporters, and custom runtime
behavior from one shared system.
What You Get
- ✅ Managed tool and LLM execution: Run call boundaries through consistent
lifecycle helpers and middleware ordering.
- ✅ Concurrent request isolation: Keep request-local middleware and
subscribers attached to the scope that owns them, then clean them up when that
scope closes.
- ✅ Multi-language semantics: Use the same runtime model from Rust, Python,
and Node.js.
- ✅ Observability-ready events: Preserve model metadata, tool call IDs,
inputs, outputs, scope relationships, and lifecycle timing for downstream
analysis.
- ✅ Built-in observability plugin: Configure Agent Trajectory Observability
Format (ATOF), ATIF, OpenTelemetry, and OpenInference exporters without
registering subscribers by hand.
- ✅ Non-blocking subscriber delivery: Keep managed execution moving while
subscriber callbacks and exporters drain in the background. Flush subscribers
before relying on callback side effects or exported files in tests and
shutdown paths.
- ✅ Extension points for framework authors: Wrap stable tool and provider
callbacks while preserving framework-owned scheduling, retries, memory, and
result handling.
flowchart LR
App[Application or Framework]
subgraph Runtime[NeMo Relay Runtime]
direction TB
Scopes[Scopes]
Middleware[Middleware]
Plugins[Plugins]
Events[Lifecycle Events]
end