Skill

serving-runtime-config

Install

Install the plugin

npx claudepluginhub rhecosystemappeng/agentic-collections --plugin rh-ai-engineer

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Configure custom ServingRuntime CRs on OpenShift AI for model serving frameworks not covered by built-in runtimes. Use when: - "Create a custom serving runtime" - "I need a runtime for ONNX / Triton / custom framework" - "Customize vLLM runtime parameters" - "What serving runtimes are available?" - "Add a custom container image for model serving" Handles listing existing runtimes, creating new ServingRuntime CRs, and validating compatibility with target models. NOT for deploying models (use /model-deploy after runtime is configured). NOT for NIM platform setup (use /nim-setup).

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

/serving-runtime-config Skill

Configure custom ServingRuntime custom resources on Red Hat OpenShift AI. Use when built-in runtimes (vLLM, NIM, Caikit+TGIS) do not support the target model framework, or when customizing an existing runtime's parameters (env vars, model format, container image).

Prerequisites

Required MCP Server: rhoai (RHOAI MCP Server)

Required MCP Tools (from rhoai):

list_serving_runtimes - List available runtimes and platform templates with supported model formats
create_serving_runtime - Instantiate a serving runtime from a platform template (no YAML needed)
list_data_science_projects - Validate namespace is an RHOAI project

Required MCP Server: openshift (OpenShift MCP Server)

Required MCP Tools (from openshift):

resources_get (from openshift) - Inspect existing ServingRuntime CRs in detail
resources_create_or_update (from openshift) - Create fully custom ServingRuntime CR (when not using templates)

Optional MCP Server: ai-observability (AI Observability MCP)

Optional MCP Tools (from ai-observability):

list_models - Verify deployed models use the new runtime

Common prerequisites (KUBECONFIG, OpenShift+RHOAI cluster, KServe, verification protocol): See skill-conventions.md.

When to Use This Skill

Use this skill when you need to:

Create a custom ServingRuntime for a framework not covered by built-in runtimes
Customize an existing runtime's parameters (env vars, container image, model format)
Instantiate a platform template runtime into a namespace
List and compare available serving runtimes and templates

Do NOT use this skill when:

You want to deploy a model using an existing runtime (use /model-deploy)
You need NIM platform setup (use /nim-setup)
You need to troubleshoot a deployment (use /debug-inference)

Workflow

Step 1: Validate Target Namespace

Ask the user for:

Namespace: Target namespace for the ServingRuntime

MCP Tool: list_data_science_projects (from rhoai)

Parameters: none

Verify the user-specified namespace is an RHOAI Data Science Project.

Error Handling:

If namespace not found in project list -> Report: "Namespace [namespace] is not an RHOAI Data Science Project. Use /ds-project-setup to create one, or specify a different namespace." WAIT for user decision.

Step 2: Gather Requirements

Ask the user for:

Use case: What framework/model needs serving? (e.g., "ONNX model", "custom TensorRT engine", "vLLM with custom args")
Intent: New runtime from scratch, or customize an existing one?

Document Consultation (read before listing runtimes):

Action: Read supported-runtimes.md using the Read tool to understand available runtimes and their capabilities
Output to user: "I consulted supported-runtimes.md to understand available runtimes."

MCP Tool: list_serving_runtimes (from rhoai)

Parameters:

namespace: validated namespace from Step 1 - REQUIRED
include_templates: true - REQUIRED (shows both existing runtimes and platform templates)

Present findings in a table:

Runtime Name	Model Format	Source	Requires Instantiation
[name]	[format]	namespace / template	[true/false]

The response distinguishes between:

Existing runtimes (source: "namespace") - ready to use with /model-deploy
Platform templates (source: "template", requires_instantiation: true) - must be instantiated first

If an existing runtime fits the user's need, recommend using it directly with /model-deploy. If a platform template fits, offer to instantiate it (Step 5 alternative). Otherwise, proceed to Step 3 for custom runtime creation.

WAIT for user to confirm whether to create a new runtime, instantiate a template, or customize an existing one.

Step 3: Determine Runtime Configuration

Based on the user's framework and model requirements, determine the ServingRuntime spec.

If customizing an existing runtime:

MCP Tool: resources_get (from openshift)

Parameters:

apiVersion: "serving.kserve.io/v1alpha1" - REQUIRED
kind: "ServingRuntime" - REQUIRED
namespace: user-specified namespace - REQUIRED
name: name of the existing runtime to customize - REQUIRED

Extract the current spec as a starting point. Present the current configuration and ask what the user wants to change.

If the user requests a runtime for an unfamiliar framework -> Trigger live doc lookup:

Action: Read live-doc-lookup.md using the Read tool for the lookup protocol
Output to user: "Framework [name] is not in my cached runtimes. I'll look up its serving requirements."
Use WebFetch to retrieve specs from Red Hat OpenShift AI documentation
Extract: container image, model format name, supported protocols, required env vars
Output to user: "I looked up [framework] on [source] to confirm its runtime requirements: [summary]"

Collect runtime parameters:

Parameter	Value	Source
Runtime name	[name]	user input
Container image	[image:tag]	user input / doc lookup
Model format name	[format]	user input / doc lookup
Supported protocol versions	[v1, v2, grpc-v2]	user input / default
Multi-model serving	[true/false]	default: false (single-model)
Environment variables	[list]	user input
GPU resource requirements	[limits]	user input

WAIT for user to confirm or modify parameters.

Step 4: Generate ServingRuntime YAML

Generate the ServingRuntime manifest using values from Steps 2-3.

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: [runtime-name]
  namespace: [namespace]
  labels:
    opendatahub.io/dashboard: "true"
  annotations:
    openshift.io/display-name: "[Display Name]"
spec:
  supportedModelFormats:
    - name: [model-format-name]
      version: "[version]"
      autoSelect: true
  multiModel: false
  containers:
    - name: kserve-container
      image: [container-image:tag]
      ports:
        - containerPort: 8080
          protocol: TCP
      env:
        - name: [ENV_VAR_NON_SECRET]
          value: "[non-sensitive-value]"
        - name: [SECRET_ENV_VAR]
          valueFrom:
            secretKeyRef:
              name: [k8s-secret-name]
              key: [secret-key-name]
      resources:
        limits:
          nvidia.com/gpu: "[gpu-count]"
        requests:
          cpu: "[cpu]"
          memory: "[memory]"

Display the ServingRuntime YAML to the user, redacting any sensitive values.

Ask: "Proceed with creating this ServingRuntime? (yes/no/modify)"

WAIT for explicit confirmation.

If yes -> Proceed to Step 5
If no -> Abort
If modify -> Ask what to change, regenerate YAML, return to this step

Step 5: Create ServingRuntime

If instantiating from a platform template (user chose a template from Step 2):

MCP Tool: create_serving_runtime (from rhoai)

Parameters:

namespace: target namespace - REQUIRED
template_name: name of the template to instantiate (e.g., "vllm-cuda-runtime-template") - REQUIRED

The response includes the created runtime name, display name, and supported model formats.

If creating a fully custom runtime (custom container image, non-template configuration):

MCP Tool: resources_create_or_update (from openshift)

Parameters:

manifest: full ServingRuntime manifest as JSON string - REQUIRED
namespace: user-specified namespace - REQUIRED

Error Handling:

If namespace not found -> Report error, suggest creating namespace or using /ds-project-setup
If runtime name already exists -> Ask user: "ServingRuntime [name] already exists. Update it? (yes/no)"
If CRD not found -> Report: "ServingRuntime CRD not available. Ensure Red Hat OpenShift AI operator is installed."
If RBAC error -> Report insufficient permissions

Step 6: Validate Runtime

MCP Tool: list_serving_runtimes (from rhoai)

Parameters:

namespace: user-specified namespace - REQUIRED
include_templates: false

Verify the runtime appears in the namespace runtime list.

For detailed inspection:

MCP Tool: resources_get (from openshift)

Parameters:

apiVersion: "serving.kserve.io/v1alpha1" - REQUIRED
kind: "ServingRuntime" - REQUIRED
namespace: user-specified namespace - REQUIRED
name: the created runtime name - REQUIRED

Report results showing: runtime name, namespace, model format, container image, and next steps (/model-deploy to deploy a model using this runtime).

Common Issues

For common issues (GPU scheduling, OOMKilled, image pull errors, RBAC), see common-issues.md.

Issue 1: InferenceService Cannot Find Runtime

Error: InferenceService status shows "Unknown" or runtime not matched

Cause: The modelFormat.name in the InferenceService does not match any supportedModelFormats[].name in available ServingRuntimes.

Solution:

Verify the model format name matches exactly (case-sensitive)
Check the runtime is in the same namespace as the InferenceService
Ensure the runtime has opendatahub.io/dashboard: "true" label

Issue 2: Runtime Port Mismatch

Error: InferenceService created but health checks fail, endpoint returns connection refused

Cause: The containerPort in the ServingRuntime does not match the port the serving framework actually listens on.

Solution:

Check the framework's documentation for its default serving port
Update the containerPort in the ServingRuntime spec
Or set an environment variable to configure the framework's listen port to match

Dependencies

MCP Tools

See Prerequisites for the complete list of required and optional MCP tools.

Related Skills

/model-deploy - Deploy a model using the configured runtime
/nim-setup - NIM platform setup (if NIM runtime is needed instead)
/debug-inference - Troubleshoot InferenceService failures after deployment

Reference Documentation

supported-runtimes.md - Runtime capabilities and model format names
live-doc-lookup.md - Protocol for fetching specs for unknown frameworks

Critical: Human-in-the-Loop Requirements

See skill-conventions.md for general HITL and security conventions.

Skill-specific checkpoints:

After namespace validation (Step 1): confirm namespace or redirect to /ds-project-setup
After listing existing runtimes (Step 2): confirm whether to create new or customize existing
After collecting parameters (Step 3): confirm runtime configuration
Before creating ServingRuntime (Step 4): display full YAML, confirm
NEVER overwrite an existing ServingRuntime without user confirmation

Links

Stats

Stars4

Forks6

Last CommitMar 18, 2026

Actions

Similar Skills

brand-guidelines

1 file

Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.

99.3k

algorithmic-art

3 files

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.

99.3k

canvas-design

20 files

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

99.3k