npx claudepluginhub rhecosystemappeng/agentic-collections --plugin rh-ai-engineerWant just this skill?
Add to a custom plugin, then install with one command.
Configure custom ServingRuntime CRs on OpenShift AI for model serving frameworks not covered by built-in runtimes. Use when: - "Create a custom serving runtime" - "I need a runtime for ONNX / Triton / custom framework" - "Customize vLLM runtime parameters" - "What serving runtimes are available?" - "Add a custom container image for model serving" Handles listing existing runtimes, creating new ServingRuntime CRs, and validating compatibility with target models. NOT for deploying models (use /model-deploy after runtime is configured). NOT for NIM platform setup (use /nim-setup).
This skill uses the workspace's default tool permissions.
/serving-runtime-config Skill
Configure custom ServingRuntime custom resources on Red Hat OpenShift AI. Use when built-in runtimes (vLLM, NIM, Caikit+TGIS) do not support the target model framework, or when customizing an existing runtime's parameters (env vars, model format, container image).
Prerequisites
Required MCP Server: rhoai (RHOAI MCP Server)
Required MCP Tools (from rhoai):
list_serving_runtimes- List available runtimes and platform templates with supported model formatscreate_serving_runtime- Instantiate a serving runtime from a platform template (no YAML needed)list_data_science_projects- Validate namespace is an RHOAI project
Required MCP Server: openshift (OpenShift MCP Server)
Required MCP Tools (from openshift):
resources_get(from openshift) - Inspect existing ServingRuntime CRs in detailresources_create_or_update(from openshift) - Create fully custom ServingRuntime CR (when not using templates)
Optional MCP Server: ai-observability (AI Observability MCP)
Optional MCP Tools (from ai-observability):
list_models- Verify deployed models use the new runtime
Common prerequisites (KUBECONFIG, OpenShift+RHOAI cluster, KServe, verification protocol): See skill-conventions.md.
When to Use This Skill
Use this skill when you need to:
- Create a custom ServingRuntime for a framework not covered by built-in runtimes
- Customize an existing runtime's parameters (env vars, container image, model format)
- Instantiate a platform template runtime into a namespace
- List and compare available serving runtimes and templates
Do NOT use this skill when:
- You want to deploy a model using an existing runtime (use
/model-deploy) - You need NIM platform setup (use
/nim-setup) - You need to troubleshoot a deployment (use
/debug-inference)
Workflow
Step 1: Validate Target Namespace
Ask the user for:
- Namespace: Target namespace for the ServingRuntime
MCP Tool: list_data_science_projects (from rhoai)
Parameters: none
Verify the user-specified namespace is an RHOAI Data Science Project.
Error Handling:
- If namespace not found in project list -> Report: "Namespace
[namespace]is not an RHOAI Data Science Project. Use/ds-project-setupto create one, or specify a different namespace." WAIT for user decision.
Step 2: Gather Requirements
Ask the user for:
- Use case: What framework/model needs serving? (e.g., "ONNX model", "custom TensorRT engine", "vLLM with custom args")
- Intent: New runtime from scratch, or customize an existing one?
Document Consultation (read before listing runtimes):
- Action: Read supported-runtimes.md using the Read tool to understand available runtimes and their capabilities
- Output to user: "I consulted supported-runtimes.md to understand available runtimes."
MCP Tool: list_serving_runtimes (from rhoai)
Parameters:
namespace: validated namespace from Step 1 - REQUIREDinclude_templates:true- REQUIRED (shows both existing runtimes and platform templates)
Present findings in a table:
| Runtime Name | Model Format | Source | Requires Instantiation |
|---|---|---|---|
| [name] | [format] | namespace / template | [true/false] |
The response distinguishes between:
- Existing runtimes (
source: "namespace") - ready to use with/model-deploy - Platform templates (
source: "template",requires_instantiation: true) - must be instantiated first
If an existing runtime fits the user's need, recommend using it directly with /model-deploy. If a platform template fits, offer to instantiate it (Step 5 alternative). Otherwise, proceed to Step 3 for custom runtime creation.
WAIT for user to confirm whether to create a new runtime, instantiate a template, or customize an existing one.
Step 3: Determine Runtime Configuration
Based on the user's framework and model requirements, determine the ServingRuntime spec.
If customizing an existing runtime:
MCP Tool: resources_get (from openshift)
Parameters:
apiVersion:"serving.kserve.io/v1alpha1"- REQUIREDkind:"ServingRuntime"- REQUIREDnamespace: user-specified namespace - REQUIREDname: name of the existing runtime to customize - REQUIRED
Extract the current spec as a starting point. Present the current configuration and ask what the user wants to change.
If the user requests a runtime for an unfamiliar framework -> Trigger live doc lookup:
- Action: Read live-doc-lookup.md using the Read tool for the lookup protocol
- Output to user: "Framework [name] is not in my cached runtimes. I'll look up its serving requirements."
- Use WebFetch to retrieve specs from Red Hat OpenShift AI documentation
- Extract: container image, model format name, supported protocols, required env vars
- Output to user: "I looked up [framework] on [source] to confirm its runtime requirements: [summary]"
Collect runtime parameters:
| Parameter | Value | Source |
|---|---|---|
| Runtime name | [name] | user input |
| Container image | [image:tag] | user input / doc lookup |
| Model format name | [format] | user input / doc lookup |
| Supported protocol versions | [v1, v2, grpc-v2] | user input / default |
| Multi-model serving | [true/false] | default: false (single-model) |
| Environment variables | [list] | user input |
| GPU resource requirements | [limits] | user input |
WAIT for user to confirm or modify parameters.
Step 4: Generate ServingRuntime YAML
Generate the ServingRuntime manifest using values from Steps 2-3.
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: [runtime-name]
namespace: [namespace]
labels:
opendatahub.io/dashboard: "true"
annotations:
openshift.io/display-name: "[Display Name]"
spec:
supportedModelFormats:
- name: [model-format-name]
version: "[version]"
autoSelect: true
multiModel: false
containers:
- name: kserve-container
image: [container-image:tag]
ports:
- containerPort: 8080
protocol: TCP
env:
- name: [ENV_VAR_NON_SECRET]
value: "[non-sensitive-value]"
- name: [SECRET_ENV_VAR]
valueFrom:
secretKeyRef:
name: [k8s-secret-name]
key: [secret-key-name]
resources:
limits:
nvidia.com/gpu: "[gpu-count]"
requests:
cpu: "[cpu]"
memory: "[memory]"
Display the ServingRuntime YAML to the user, redacting any sensitive values.
Ask: "Proceed with creating this ServingRuntime? (yes/no/modify)"
WAIT for explicit confirmation.
- If yes -> Proceed to Step 5
- If no -> Abort
- If modify -> Ask what to change, regenerate YAML, return to this step
Step 5: Create ServingRuntime
If instantiating from a platform template (user chose a template from Step 2):
MCP Tool: create_serving_runtime (from rhoai)
Parameters:
namespace: target namespace - REQUIREDtemplate_name: name of the template to instantiate (e.g.,"vllm-cuda-runtime-template") - REQUIRED
The response includes the created runtime name, display name, and supported model formats.
If creating a fully custom runtime (custom container image, non-template configuration):
MCP Tool: resources_create_or_update (from openshift)
Parameters:
manifest: full ServingRuntime manifest as JSON string - REQUIREDnamespace: user-specified namespace - REQUIRED
Error Handling:
- If namespace not found -> Report error, suggest creating namespace or using
/ds-project-setup - If runtime name already exists -> Ask user: "ServingRuntime
[name]already exists. Update it? (yes/no)" - If CRD not found -> Report: "ServingRuntime CRD not available. Ensure Red Hat OpenShift AI operator is installed."
- If RBAC error -> Report insufficient permissions
Step 6: Validate Runtime
MCP Tool: list_serving_runtimes (from rhoai)
Parameters:
namespace: user-specified namespace - REQUIREDinclude_templates:false
Verify the runtime appears in the namespace runtime list.
For detailed inspection:
MCP Tool: resources_get (from openshift)
Parameters:
apiVersion:"serving.kserve.io/v1alpha1"- REQUIREDkind:"ServingRuntime"- REQUIREDnamespace: user-specified namespace - REQUIREDname: the created runtime name - REQUIRED
Report results showing: runtime name, namespace, model format, container image, and next steps (/model-deploy to deploy a model using this runtime).
Common Issues
For common issues (GPU scheduling, OOMKilled, image pull errors, RBAC), see common-issues.md.
Issue 1: InferenceService Cannot Find Runtime
Error: InferenceService status shows "Unknown" or runtime not matched
Cause: The modelFormat.name in the InferenceService does not match any supportedModelFormats[].name in available ServingRuntimes.
Solution:
- Verify the model format name matches exactly (case-sensitive)
- Check the runtime is in the same namespace as the InferenceService
- Ensure the runtime has
opendatahub.io/dashboard: "true"label
Issue 2: Runtime Port Mismatch
Error: InferenceService created but health checks fail, endpoint returns connection refused
Cause: The containerPort in the ServingRuntime does not match the port the serving framework actually listens on.
Solution:
- Check the framework's documentation for its default serving port
- Update the
containerPortin the ServingRuntime spec - Or set an environment variable to configure the framework's listen port to match
Dependencies
MCP Tools
See Prerequisites for the complete list of required and optional MCP tools.
Related Skills
/model-deploy- Deploy a model using the configured runtime/nim-setup- NIM platform setup (if NIM runtime is needed instead)/debug-inference- Troubleshoot InferenceService failures after deployment
Reference Documentation
- supported-runtimes.md - Runtime capabilities and model format names
- live-doc-lookup.md - Protocol for fetching specs for unknown frameworks
Critical: Human-in-the-Loop Requirements
See skill-conventions.md for general HITL and security conventions.
Skill-specific checkpoints:
- After namespace validation (Step 1): confirm namespace or redirect to
/ds-project-setup - After listing existing runtimes (Step 2): confirm whether to create new or customize existing
- After collecting parameters (Step 3): confirm runtime configuration
- Before creating ServingRuntime (Step 4): display full YAML, confirm
- NEVER overwrite an existing ServingRuntime without user confirmation