Skill

model-monitor

Install

Install the plugin

npx claudepluginhub rhecosystemappeng/agentic-collections --plugin rh-ai-engineer

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Configure TrustyAI model monitoring for bias detection and data drift on deployed InferenceServices. Use when: - "Monitor my model for bias" - "Set up drift detection on my inference endpoint" - "Configure TrustyAI for my deployed model" - "Check if my model has fairness issues" - "I need SPD / DIR metrics for my model" Handles TrustyAIService deployment, bias metric configuration (SPD, DIR), drift metric configuration (MeanShift, FourierMMD, KS-Test, Jensen-Shannon), threshold tuning, and monitoring validation. NOT for deploying models (use /model-deploy first). NOT for input/output content safety guardrails (use /guardrails-config). NOT for infrastructure-level observability (use /ai-observability).

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/trustyai-metrics-reference.md

Skill Content

/model-monitor Skill

Prerequisites

Required MCP Server: openshift (OpenShift MCP Server)

Required MCP Tools (from openshift):

resources_get (from openshift) - Get TrustyAIService CR status, check CRD availability
resources_list (from openshift) - List TrustyAIService instances, check CRD existence
resources_create_or_update (from openshift) - Create/update TrustyAIService CR, metric configuration ConfigMaps
pods_list (from openshift) - Verify TrustyAI pods are running
pods_log (from openshift) - Retrieve TrustyAI pod logs for troubleshooting
events_list (from openshift) - Check events for TrustyAI deployment issues
prometheus_query (from openshift) - Query TrustyAI metrics (trustyai_spd, trustyai_dir, drift metrics)

Required MCP Server: rhoai (RHOAI MCP Server)

Required MCP Tools (from rhoai):

list_inference_services - List deployed models to identify monitoring targets
get_inference_service - Get InferenceService details (model format, runtime, status)
list_data_science_projects - Validate namespace is an RHOAI Data Science Project

Optional MCP Server: ai-observability (AI Observability MCP)

Optional MCP Tools (from ai-observability):

execute_promql - Custom PromQL queries for TrustyAI metrics validation

Common prerequisites (KUBECONFIG, OpenShift+RHOAI cluster, KServe, verification protocol): See skill-conventions.md.

Additional cluster requirements:

TrustyAI operator enabled in the DataScienceCluster CR
At least one deployed InferenceService to monitor (via /model-deploy)
User Workload Monitoring enabled in OpenShift (for TrustyAI metrics scraping)

When to Use This Skill

Use this skill when you need to:

Set up bias monitoring (SPD, DIR) for a deployed model
Configure data drift detection on inference data streams
Deploy a TrustyAIService instance in a namespace
Check whether monitoring is active and metrics are flowing

Do NOT use this skill when:

You need to deploy a model first (use /model-deploy)
You need LLM input/output content safety guardrails (use /guardrails-config)
You want infrastructure-level performance metrics (use /ai-observability)
You need to troubleshoot a failed model deployment (use /debug-inference)

Workflow

Step 1: Verify TrustyAI Operator Installation

MCP Tool: resources_list (from openshift)

Parameters:

apiVersion: "apiextensions.k8s.io/v1" - REQUIRED
kind: "CustomResourceDefinition" - REQUIRED

Check for the presence of trustyaiservices.trustyai.opendatahub.io CRD. This is a hard prerequisite — nothing in this skill works without it.

Error Handling:

If CRD not found: Report that TrustyAI must be enabled in the DataScienceCluster CR with spec.components.trustyai.managementState: Managed. Offer options: (1) Show enablement instructions, (2) Abort. WAIT for user decision.

Step 2: Gather Monitoring Requirements

Ask the user for:

Target model: Which InferenceService to monitor (name or "list all")
Namespace: Target namespace
Monitoring type: Bias detection, drift detection, or both
For bias monitoring: protected attribute, favorable outcome, privileged/unprivileged group values
For drift monitoring: which drift metrics to enable (default: all)

If user is unsure about target model, use list_inference_services (from rhoai) to present available models.

MCP Tool: list_inference_services (from rhoai)

Parameters:

namespace: user-specified namespace - REQUIRED
verbosity: "standard" - OPTIONAL

Present configuration summary for confirmation. WAIT for user to confirm or modify.

Step 3: Check/Create TrustyAIService in Namespace

Document Consultation (read before configuring TrustyAI):

Action: Read trustyai-metrics-reference.md using the Read tool to understand CRD spec fields, metric names, and thresholds
Output to user: "I consulted trustyai-metrics-reference.md to understand TrustyAI CRD specifications."

MCP Tool: resources_get (from openshift)

Parameters:

apiVersion: "trustyai.opendatahub.io/v1alpha1" - REQUIRED
kind: "TrustyAIService" - REQUIRED
namespace: target namespace - REQUIRED
name: "trustyai-service" - REQUIRED

If TrustyAIService exists and is Ready: Proceed to Step 5.

If TrustyAIService exists but NOT Ready: Check pod status (Step 4). WAIT for user decision.

If TrustyAIService does NOT exist: Construct TrustyAIService manifest using the CRD spec from trustyai-metrics-reference.md. Key values: name=trustyai-service, storage PVC 1Gi, CSV format, 5s schedule.

Display the manifest to the user. Ask: "Proceed with creating this TrustyAIService? (yes/no/modify)"

WAIT for explicit confirmation.

MCP Tool: resources_create_or_update (from openshift)

Parameters:

manifest: the TrustyAIService YAML manifest as JSON string - REQUIRED

Error Handling:

If RBAC error -> Report insufficient permissions
If quota error -> Report resource quota exceeded

Step 4: Verify TrustyAI Pods Are Running

MCP Tool: pods_list (from openshift)

Parameters:

namespace: target namespace - REQUIRED
labelSelector: "app.kubernetes.io/name=trustyai-service" - REQUIRED

Verify at least one TrustyAI pod is in Running state. Report pod status.

If pods not ready (Pending, CrashLoopBackOff, etc.):

Use pods_log and events_list (from openshift) to diagnose. Present findings and options: (1) View full logs, (2) Check events, (3) Delete and recreate TrustyAIService, (4) Abort. WAIT for user decision. NEVER auto-delete TrustyAIService.

Step 5: Configure Bias Metrics

Condition: Only when monitoring type includes bias detection.

Create ConfigMap trustyai-bias-config-[isvc-name] with SPD and DIR configurations using the field schema from trustyai-metrics-reference.md. Populate modelId, protectedAttribute, favorableOutcome, outcomeName, privilegedAttribute, unprivilegedAttribute with user-provided values from Step 1. Use default thresholds (SPD ±0.1, DIR 0.8–1.2) unless user specifies otherwise.

MCP Tool: resources_create_or_update (from openshift)

Parameters:

manifest: ConfigMap YAML manifest as JSON string - REQUIRED

Display manifest to user with threshold explanation. Ask: "Proceed with these bias metric configurations? (yes/no/modify)"

WAIT for explicit confirmation.

Step 6: Configure Drift Metrics

Condition: Only when monitoring type includes drift detection.

Create ConfigMap trustyai-drift-config-[isvc-name] using the drift schema from trustyai-metrics-reference.md. Include selected metrics (default: MEANSHIFT, FOURIERMMD, KSTEST, JENSENSHANNON) with recommended thresholds from the reference doc.

MCP Tool: resources_create_or_update (from openshift)

Parameters:

manifest: ConfigMap YAML manifest as JSON string - REQUIRED

Display manifest to user. Ask: "Proceed with these drift metric configurations? (yes/no/modify)"

WAIT for explicit confirmation.

Step 7: Validate Monitoring Is Active

Wait 30-60 seconds after configuration, then verify metrics are being produced.

MCP Tool: prometheus_query (from openshift) or execute_promql (from ai-observability)

Parameters:

query: "trustyai_spd{model=\"[isvc-name]\"}" - REQUIRED (for bias)
query: "trustyai_meanshift{model=\"[isvc-name]\"}" - REQUIRED (for drift)

If metrics are present: Report current values and confirm monitoring is active.

If metrics are NOT present: Expected if no inference requests have been made yet. Inform user that ~100 requests are needed for stable bias metrics per trustyai-metrics-reference.md.

Step 8: Summary and Next Steps

Present summary showing: TrustyAI status, configured metrics with thresholds, PromQL queries for dashboards (from trustyai-metrics-reference.md), and next steps (/ai-observability, /guardrails-config).

Common Issues

For common issues (GPU scheduling, OOMKilled, image pull errors, RBAC), see common-issues.md.

Issue 1: TrustyAI Pod CrashLoopBackOff

Error: TrustyAI pod restarts repeatedly with storage-related errors

Cause: PVC for TrustyAI data storage cannot be provisioned, or the storage class is unavailable.

Solution:

Check PVC status: resources_list for PVCs in namespace with TrustyAI labels
Verify a default StorageClass exists: resources_list for StorageClass
If no default StorageClass, specify one in the TrustyAIService CR spec.storage.storageClass
Check pod logs for specific storage errors

Issue 2: No Metrics Appearing in Prometheus

Error: PromQL queries return empty results even after inference requests

Cause: User Workload Monitoring is not enabled, or the TrustyAI ServiceMonitor is missing.

Solution:

Verify User Workload Monitoring is enabled: check cluster-monitoring-config ConfigMap in openshift-monitoring namespace for enableUserWorkload: true
Check that a ServiceMonitor exists for TrustyAI: resources_list for ServiceMonitor in the namespace
Verify TrustyAI pods expose the /q/metrics endpoint

Issue 3: Bias Metrics Show Insufficient Data

Error: SPD/DIR metrics return NaN or insufficient data warnings

Cause: Not enough inference requests with the protected attribute. TrustyAI requires ~100 requests for stable metrics.

Solution:

Send more inference requests with varied protected attribute values
Ensure the inference payload includes the protected attribute field
Verify the protectedAttribute field name matches the model's input schema exactly

Dependencies

MCP Tools

See Prerequisites for the complete list of required and optional MCP tools.

Related Skills

/model-deploy - Deploy the InferenceService before configuring monitoring
/debug-inference - Troubleshoot issues found by monitoring alerts
/ai-observability - Infrastructure-level performance metrics (complements TrustyAI fairness metrics)
/guardrails-config - Add content safety guardrails to the monitored model

Reference Documentation

trustyai-metrics-reference.md - TrustyAI CRD specs, Prometheus metric names, ConfigMap schemas, and threshold guidance

Critical: Human-in-the-Loop Requirements

See skill-conventions.md for general HITL and security conventions.

Skill-specific checkpoints:

After gathering requirements (Step 2): confirm monitoring configuration
Before creating TrustyAIService (Step 3): display manifest, confirm creation
On TrustyAI pod failure (Step 4): present diagnostic options, wait for user decision
Before configuring bias metrics (Step 5): confirm metric parameters and thresholds
Before configuring drift metrics (Step 6): confirm metric parameters and thresholds
NEVER auto-delete TrustyAIService or metric configurations
NEVER modify fairness thresholds without explicit user confirmation

Links

Stats

Stars4

Forks6

Last CommitMar 17, 2026

Actions

Similar Skills

brand-guidelines

1 file

Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.

99.3k

algorithmic-art

3 files

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.

99.3k

canvas-design

20 files

Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.

99.3k