npx claudepluginhub rhecosystemappeng/agentic-collections --plugin rh-ai-engineerWant just this skill?
Add to a custom plugin, then install with one command.
Configure TrustyAI model monitoring for bias detection and data drift on deployed InferenceServices. Use when: - "Monitor my model for bias" - "Set up drift detection on my inference endpoint" - "Configure TrustyAI for my deployed model" - "Check if my model has fairness issues" - "I need SPD / DIR metrics for my model" Handles TrustyAIService deployment, bias metric configuration (SPD, DIR), drift metric configuration (MeanShift, FourierMMD, KS-Test, Jensen-Shannon), threshold tuning, and monitoring validation. NOT for deploying models (use /model-deploy first). NOT for input/output content safety guardrails (use /guardrails-config). NOT for infrastructure-level observability (use /ai-observability).
This skill uses the workspace's default tool permissions.
references/trustyai-metrics-reference.md/model-monitor Skill
Prerequisites
Required MCP Server: openshift (OpenShift MCP Server)
Required MCP Tools (from openshift):
resources_get(from openshift) - Get TrustyAIService CR status, check CRD availabilityresources_list(from openshift) - List TrustyAIService instances, check CRD existenceresources_create_or_update(from openshift) - Create/update TrustyAIService CR, metric configuration ConfigMapspods_list(from openshift) - Verify TrustyAI pods are runningpods_log(from openshift) - Retrieve TrustyAI pod logs for troubleshootingevents_list(from openshift) - Check events for TrustyAI deployment issuesprometheus_query(from openshift) - Query TrustyAI metrics (trustyai_spd, trustyai_dir, drift metrics)
Required MCP Server: rhoai (RHOAI MCP Server)
Required MCP Tools (from rhoai):
list_inference_services- List deployed models to identify monitoring targetsget_inference_service- Get InferenceService details (model format, runtime, status)list_data_science_projects- Validate namespace is an RHOAI Data Science Project
Optional MCP Server: ai-observability (AI Observability MCP)
Optional MCP Tools (from ai-observability):
execute_promql- Custom PromQL queries for TrustyAI metrics validation
Common prerequisites (KUBECONFIG, OpenShift+RHOAI cluster, KServe, verification protocol): See skill-conventions.md.
Additional cluster requirements:
- TrustyAI operator enabled in the DataScienceCluster CR
- At least one deployed InferenceService to monitor (via
/model-deploy) - User Workload Monitoring enabled in OpenShift (for TrustyAI metrics scraping)
When to Use This Skill
Use this skill when you need to:
- Set up bias monitoring (SPD, DIR) for a deployed model
- Configure data drift detection on inference data streams
- Deploy a TrustyAIService instance in a namespace
- Check whether monitoring is active and metrics are flowing
Do NOT use this skill when:
- You need to deploy a model first (use
/model-deploy) - You need LLM input/output content safety guardrails (use
/guardrails-config) - You want infrastructure-level performance metrics (use
/ai-observability) - You need to troubleshoot a failed model deployment (use
/debug-inference)
Workflow
Step 1: Verify TrustyAI Operator Installation
MCP Tool: resources_list (from openshift)
Parameters:
apiVersion:"apiextensions.k8s.io/v1"- REQUIREDkind:"CustomResourceDefinition"- REQUIRED
Check for the presence of trustyaiservices.trustyai.opendatahub.io CRD. This is a hard prerequisite — nothing in this skill works without it.
Error Handling:
- If CRD not found: Report that TrustyAI must be enabled in the DataScienceCluster CR with
spec.components.trustyai.managementState: Managed. Offer options: (1) Show enablement instructions, (2) Abort. WAIT for user decision.
Step 2: Gather Monitoring Requirements
Ask the user for:
- Target model: Which InferenceService to monitor (name or "list all")
- Namespace: Target namespace
- Monitoring type: Bias detection, drift detection, or both
- For bias monitoring: protected attribute, favorable outcome, privileged/unprivileged group values
- For drift monitoring: which drift metrics to enable (default: all)
If user is unsure about target model, use list_inference_services (from rhoai) to present available models.
MCP Tool: list_inference_services (from rhoai)
Parameters:
namespace: user-specified namespace - REQUIREDverbosity:"standard"- OPTIONAL
Present configuration summary for confirmation. WAIT for user to confirm or modify.
Step 3: Check/Create TrustyAIService in Namespace
Document Consultation (read before configuring TrustyAI):
- Action: Read trustyai-metrics-reference.md using the Read tool to understand CRD spec fields, metric names, and thresholds
- Output to user: "I consulted trustyai-metrics-reference.md to understand TrustyAI CRD specifications."
MCP Tool: resources_get (from openshift)
Parameters:
apiVersion:"trustyai.opendatahub.io/v1alpha1"- REQUIREDkind:"TrustyAIService"- REQUIREDnamespace: target namespace - REQUIREDname:"trustyai-service"- REQUIRED
If TrustyAIService exists and is Ready: Proceed to Step 5.
If TrustyAIService exists but NOT Ready: Check pod status (Step 4). WAIT for user decision.
If TrustyAIService does NOT exist: Construct TrustyAIService manifest using the CRD spec from trustyai-metrics-reference.md. Key values: name=trustyai-service, storage PVC 1Gi, CSV format, 5s schedule.
Display the manifest to the user. Ask: "Proceed with creating this TrustyAIService? (yes/no/modify)"
WAIT for explicit confirmation.
MCP Tool: resources_create_or_update (from openshift)
Parameters:
manifest: the TrustyAIService YAML manifest as JSON string - REQUIRED
Error Handling:
- If RBAC error -> Report insufficient permissions
- If quota error -> Report resource quota exceeded
Step 4: Verify TrustyAI Pods Are Running
MCP Tool: pods_list (from openshift)
Parameters:
namespace: target namespace - REQUIREDlabelSelector:"app.kubernetes.io/name=trustyai-service"- REQUIRED
Verify at least one TrustyAI pod is in Running state. Report pod status.
If pods not ready (Pending, CrashLoopBackOff, etc.):
Use pods_log and events_list (from openshift) to diagnose. Present findings and options: (1) View full logs, (2) Check events, (3) Delete and recreate TrustyAIService, (4) Abort. WAIT for user decision. NEVER auto-delete TrustyAIService.
Step 5: Configure Bias Metrics
Condition: Only when monitoring type includes bias detection.
Create ConfigMap trustyai-bias-config-[isvc-name] with SPD and DIR configurations using the field schema from trustyai-metrics-reference.md. Populate modelId, protectedAttribute, favorableOutcome, outcomeName, privilegedAttribute, unprivilegedAttribute with user-provided values from Step 1. Use default thresholds (SPD ±0.1, DIR 0.8–1.2) unless user specifies otherwise.
MCP Tool: resources_create_or_update (from openshift)
Parameters:
manifest: ConfigMap YAML manifest as JSON string - REQUIRED
Display manifest to user with threshold explanation. Ask: "Proceed with these bias metric configurations? (yes/no/modify)"
WAIT for explicit confirmation.
Step 6: Configure Drift Metrics
Condition: Only when monitoring type includes drift detection.
Create ConfigMap trustyai-drift-config-[isvc-name] using the drift schema from trustyai-metrics-reference.md. Include selected metrics (default: MEANSHIFT, FOURIERMMD, KSTEST, JENSENSHANNON) with recommended thresholds from the reference doc.
MCP Tool: resources_create_or_update (from openshift)
Parameters:
manifest: ConfigMap YAML manifest as JSON string - REQUIRED
Display manifest to user. Ask: "Proceed with these drift metric configurations? (yes/no/modify)"
WAIT for explicit confirmation.
Step 7: Validate Monitoring Is Active
Wait 30-60 seconds after configuration, then verify metrics are being produced.
MCP Tool: prometheus_query (from openshift) or execute_promql (from ai-observability)
Parameters:
query:"trustyai_spd{model=\"[isvc-name]\"}"- REQUIRED (for bias)query:"trustyai_meanshift{model=\"[isvc-name]\"}"- REQUIRED (for drift)
If metrics are present: Report current values and confirm monitoring is active.
If metrics are NOT present: Expected if no inference requests have been made yet. Inform user that ~100 requests are needed for stable bias metrics per trustyai-metrics-reference.md.
Step 8: Summary and Next Steps
Present summary showing: TrustyAI status, configured metrics with thresholds, PromQL queries for dashboards (from trustyai-metrics-reference.md), and next steps (/ai-observability, /guardrails-config).
Common Issues
For common issues (GPU scheduling, OOMKilled, image pull errors, RBAC), see common-issues.md.
Issue 1: TrustyAI Pod CrashLoopBackOff
Error: TrustyAI pod restarts repeatedly with storage-related errors
Cause: PVC for TrustyAI data storage cannot be provisioned, or the storage class is unavailable.
Solution:
- Check PVC status:
resources_listfor PVCs in namespace with TrustyAI labels - Verify a default StorageClass exists:
resources_listfor StorageClass - If no default StorageClass, specify one in the TrustyAIService CR
spec.storage.storageClass - Check pod logs for specific storage errors
Issue 2: No Metrics Appearing in Prometheus
Error: PromQL queries return empty results even after inference requests
Cause: User Workload Monitoring is not enabled, or the TrustyAI ServiceMonitor is missing.
Solution:
- Verify User Workload Monitoring is enabled: check
cluster-monitoring-configConfigMap inopenshift-monitoringnamespace forenableUserWorkload: true - Check that a ServiceMonitor exists for TrustyAI:
resources_listfor ServiceMonitor in the namespace - Verify TrustyAI pods expose the
/q/metricsendpoint
Issue 3: Bias Metrics Show Insufficient Data
Error: SPD/DIR metrics return NaN or insufficient data warnings
Cause: Not enough inference requests with the protected attribute. TrustyAI requires ~100 requests for stable metrics.
Solution:
- Send more inference requests with varied protected attribute values
- Ensure the inference payload includes the protected attribute field
- Verify the
protectedAttributefield name matches the model's input schema exactly
Dependencies
MCP Tools
See Prerequisites for the complete list of required and optional MCP tools.
Related Skills
/model-deploy- Deploy the InferenceService before configuring monitoring/debug-inference- Troubleshoot issues found by monitoring alerts/ai-observability- Infrastructure-level performance metrics (complements TrustyAI fairness metrics)/guardrails-config- Add content safety guardrails to the monitored model
Reference Documentation
- trustyai-metrics-reference.md - TrustyAI CRD specs, Prometheus metric names, ConfigMap schemas, and threshold guidance
Critical: Human-in-the-Loop Requirements
See skill-conventions.md for general HITL and security conventions.
Skill-specific checkpoints:
- After gathering requirements (Step 2): confirm monitoring configuration
- Before creating TrustyAIService (Step 3): display manifest, confirm creation
- On TrustyAI pod failure (Step 4): present diagnostic options, wait for user decision
- Before configuring bias metrics (Step 5): confirm metric parameters and thresholds
- Before configuring drift metrics (Step 6): confirm metric parameters and thresholds
- NEVER auto-delete TrustyAIService or metric configurations
- NEVER modify fairness thresholds without explicit user confirmation