Slash Command

/mlops-review

Runs MLOps maturity audit on ML systems covering experiment tracking, model registry, serving SLOs, drift detection, retraining pipelines, A/B testing, and data quality. Produces prioritized findings report with score via mlops-architect agent.

ai-ml

devops

npx claudepluginhub marvinrichter/clarc --plugin clarc

Popularity

Stars

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/clarc:mlops-review

Model invocable

No pre-commands

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

# MLOps Review

This command runs a comprehensive audit of an ML system's operational maturity across the full MLOps lifecycle.

## What This Command Does

1. **Experiment Tracking** — Are all training runs logged and reproducible?
2. **Model Registry** — Is versioning, lineage, and promotion workflow in place?
3. **Serving** — Are latency SLOs defined and GPU utilization monitored?
4. **Monitoring** — Is drift detection active with automated alerts?
5. **Retraining** — Is there an automated trigger and pipeline?
6. **A/B Testing** — Can new models be rolled out safely without user impact?
...

Command Content

223 lines · ~2.1k tokens

Stats

LanguageJavaScript

Stars9

MaintenanceExcellent

Last CommitApr 7, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Check	Tool	Severity if missing
All training runs logged (params, metrics, artifacts)	MLflow / W&B	HIGH
Experiments reproducible from logged config	DVC + MLflow	HIGH
Hyperparameter search tracked	Optuna / Ray Tune + W&B	MEDIUM
Model artifact saved with input signature	MLflow `infer_signature`	MEDIUM

Check	Expected	Severity if missing
Models registered with semantic versions	`model:v1.2.0`	MEDIUM
Lineage metadata (training run, dataset version)	Tags on model version	HIGH
Staging → Production promotion workflow	MLflow stages or aliases	HIGH
Model cards with intended use, limitations	README in registry	MEDIUM

Check	Expected	Severity if missing
Latency SLO defined (p50, p95, p99)	`< 100ms p95`	HIGH
GPU utilization monitored	DCGM Exporter + Grafana	HIGH
Health endpoint implemented	`/v2/health/ready`	CRITICAL
Graceful shutdown with in-flight request draining	SIGTERM handler	HIGH
Horizontal scaling configured	HPA or KEDA	MEDIUM

Check	Expected	Severity if missing
Data drift detection active	Evidently AI / WhyLogs	HIGH
Model performance monitoring	Accuracy/F1 on ground truth	CRITICAL
Alerts configured for degradation	PagerDuty / Slack via Prometheus	HIGH
Ground truth collection pipeline	Labels from user feedback/labels	HIGH
Dashboard for model KPIs	Grafana	MEDIUM

Check	Expected	Severity if missing
Automated retraining trigger exists	Drift alert / cron / data threshold	HIGH
Pipeline is reproducible	Kubeflow / Airflow / Prefect DAG	HIGH
Evaluation gate before promotion	Accuracy threshold check	CRITICAL
Data versioning with DVC	`dvc.yaml` + remote	HIGH
Rollback procedure documented	Runbook	MEDIUM

Check	Expected	Severity if missing
Traffic splitting infrastructure	Istio / nginx / feature flags	HIGH
Shadow mode tested	Challenger receives but doesn't return	HIGH
Statistical significance framework	p-value / Bayesian testing	MEDIUM
Business metric integration	CTR, conversion alongside ML metrics	MEDIUM

Check	Expected	Severity if missing
Schema validation before inference	Pydantic / Great Expectations	HIGH
Out-of-distribution detection	Statistical bounds on features	MEDIUM
Missing feature handling	Imputation or rejection	HIGH
Prediction logging for ground truth	Feature store / event log	HIGH

Severity	Condition	Action
CRITICAL	No health endpoint; no evaluation gate before production promotion	Fix before any deployment
HIGH	Missing drift detection; no retraining trigger; no latency SLO	Fix within sprint
MEDIUM	No model card; no A/B testing; missing lineage	Backlog with priority
LOW	Dashboard improvements; nice-to-have tooling	Opportunistic

Score	Level	Description
1–2	Initial	Manual, ad-hoc processes
3–4	Defined	Documented, some automation
5–6	Measured	Monitored, data-driven decisions
7	Optimizing	Fully automated with continuous improvement

/mlops-review

Popularity

Invocation

Context Preview

Command Content

Help us improve

Help us improve

Help us improve

/mlops-review

Popularity

Invocation

Context Preview

Command Content

MLOps Review

What This Command Does

When to Use

Review Process

Step 0 — Delegate to mlops-architect Agent

Review Checklist

1. Experiment Tracking

2. Model Registry

3. Model Serving

4. Drift Detection & Monitoring

5. Retraining Pipeline

6. A/B Testing Capability

7. Data Quality (Input Validation)

Severity Levels

Output Report Format

Maturity Levels

Related

After This

Help us improve

MLOps Review

What This Command Does

When to Use

Review Process

Step 0 — Delegate to mlops-architect Agent

Review Checklist

1. Experiment Tracking

2. Model Registry

3. Model Serving

4. Drift Detection & Monitoring

5. Retraining Pipeline

6. A/B Testing Capability

7. Data Quality (Input Validation)

Severity Levels

Output Report Format

Maturity Levels

Related

After This