Help us improve
Share bugs, ideas, or general feedback.
From godmode
Guides MLOps workflows for ML model deployment: readiness checklists, serving infrastructure (FastAPI, SageMaker, Triton), inference optimization, versioning, A/B testing, drift detection, retraining, and monitoring.
npx claudepluginhub arbazkhan971/godmodeHow this skill is triggered — by the user, by Claude, or both
Slash command
/godmode:mlopsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- `/godmode:mlops`, "deploy model", "model serving"
Deploys ML models to production serving infrastructure using MLflow, BentoML, or Seldon Core with REST/gRPC endpoints. Implements autoscaling, monitoring, and A/B testing for real-time inference.
Builds production ML systems using PyTorch, TensorFlow, and modern frameworks. Covers model serving, feature engineering, A/B testing, and monitoring.
Audits ML pipeline reproducibility, experiment tracking hygiene, and model versioning. Advises on serving patterns and prompt evaluation across MLflow, W&B, SageMaker, Vertex AI.
Share bugs, ideas, or general feedback.
/godmode:mlops, "deploy model", "model serving"Model: <name and version>
Source: EXP-<ID>
Checklist:
[ ] Evaluation complete (test metrics documented)
[ ] Bias/fairness check passed
[ ] Artifacts saved (weights, config, preprocessor)
[ ] Input/output schema documented
[ ] Latency benchmarked (< target p99 ms)
[ ] Size acceptable (< N MB)
IF latency p99 > 100ms: apply optimization. IF model size > 500MB: consider distillation/pruning.
Options:
TF Serving: TensorFlow models, gRPC/REST
Triton: multi-framework, ONNX/TensorRT
SageMaker: managed AWS, auto-scaling
FastAPI/Ray Serve: custom, flexible
# Check for serving frameworks
pip list | grep -iE "fastapi|ray|triton|sagemaker"
ls model_repository/ serve/ 2>/dev/null
| Optimization | Latency | Size | Accuracy |
| Baseline FP32 | <ms> | <MB> | <val> |
| FP16 quant | <ms> | <MB> | <val> |
| INT8 quant | <ms> | <MB> | <val> |
| ONNX | <ms> | <MB> | <val> |
| Distillation | <ms> | <MB> | <val> |
IF accuracy drop > 1% from quantization: use FP16 only. IF latency target not met: try TensorRT or distillation.
Batching: static (fixed workload), dynamic (variable traffic, max_queue_delay_ms), adaptive (auto-tune).
| Version | Metric | Status | Traffic |
| v3.1 | F1=0.891 | CHAMPION | 90% |
| v3.2 | F1=0.903 | CANARY | 10% |
| v3.0 | F1=0.879 | ARCHIVED | 0% |
Lifecycle: STAGED->CANARY->CHAMPION->ARCHIVED
Champion: v<N> Challenger: v<N>
Split: <champion%>/<challenger%>
Routing: random|user-hash|feature-flag
Duration: <minimum days>
Sample size: <minimum per variant>
Success: primary metric >= <threshold> improvement
Guardrails: latency p99, error rate, business KPIs
IF p-value > 0.05 after min samples: no winner. IF guardrail regresses > 2%: stop test, revert.
Feature drift (PSI):
< 0.1: no drift
0.1-0.2: moderate — monitor closely
> 0.2: significant — trigger retraining
Performance:
< 2% drop: normal variance
2-5% drop: warning — schedule review
> 5% drop: alert — trigger retraining
Trigger: scheduled|drift-based|performance-based
Frequency: daily|weekly|monthly
Data window: last N days
Auto_deploy: false (requires A/B or human gate)
Cooldown: minimum time between retraining runs
Requests/sec: <current> (avg/peak)
Latency p50/p95/p99: <ms>/<ms>/<ms>
Error rate: <pct>
Primary metric (7d rolling): <val>
Drift status: NONE|LOW|MODERATE|HIGH
Append .godmode/mlops-results.tsv:
timestamp model version action latency_p99 status
KEEP if: metrics improve AND no guardrail regression
AND pipeline runs end-to-end.
DISCARD if: metrics regress OR pipeline fails.
Revert and log reason.
STOP when FIRST of:
- Model stable at 100% for 24h
- Drift monitoring configured
- Rollback tested < 5 min
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| OOM during training | Resume checkpoint, reduce batch |
| Performance degrades | Check drift, trigger retrain |
| A/B no difference | Verify sample size, document null |