npx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:mlops, "deploy model", "model serving"Model: <name and version>
Source: EXP-<ID>
Checklist:
[ ] Evaluation complete (test metrics documented)
[ ] Bias/fairness check passed
[ ] Artifacts saved (weights, config, preprocessor)
[ ] Input/output schema documented
[ ] Latency benchmarked (< target p99 ms)
[ ] Size acceptable (< N MB)
IF latency p99 > 100ms: apply optimization. IF model size > 500MB: consider distillation/pruning.
Options:
TF Serving: TensorFlow models, gRPC/REST
Triton: multi-framework, ONNX/TensorRT
SageMaker: managed AWS, auto-scaling
FastAPI/Ray Serve: custom, flexible
# Check for serving frameworks
pip list | grep -iE "fastapi|ray|triton|sagemaker"
ls model_repository/ serve/ 2>/dev/null
| Optimization | Latency | Size | Accuracy |
| Baseline FP32 | <ms> | <MB> | <val> |
| FP16 quant | <ms> | <MB> | <val> |
| INT8 quant | <ms> | <MB> | <val> |
| ONNX | <ms> | <MB> | <val> |
| Distillation | <ms> | <MB> | <val> |
IF accuracy drop > 1% from quantization: use FP16 only. IF latency target not met: try TensorRT or distillation.
Batching: static (fixed workload), dynamic (variable traffic, max_queue_delay_ms), adaptive (auto-tune).
| Version | Metric | Status | Traffic |
| v3.1 | F1=0.891 | CHAMPION | 90% |
| v3.2 | F1=0.903 | CANARY | 10% |
| v3.0 | F1=0.879 | ARCHIVED | 0% |
Lifecycle: STAGED->CANARY->CHAMPION->ARCHIVED
Champion: v<N> Challenger: v<N>
Split: <champion%>/<challenger%>
Routing: random|user-hash|feature-flag
Duration: <minimum days>
Sample size: <minimum per variant>
Success: primary metric >= <threshold> improvement
Guardrails: latency p99, error rate, business KPIs
IF p-value > 0.05 after min samples: no winner. IF guardrail regresses > 2%: stop test, revert.
Feature drift (PSI):
< 0.1: no drift
0.1-0.2: moderate — monitor closely
> 0.2: significant — trigger retraining
Performance:
< 2% drop: normal variance
2-5% drop: warning — schedule review
> 5% drop: alert — trigger retraining
Trigger: scheduled|drift-based|performance-based
Frequency: daily|weekly|monthly
Data window: last N days
Auto_deploy: false (requires A/B or human gate)
Cooldown: minimum time between retraining runs
Requests/sec: <current> (avg/peak)
Latency p50/p95/p99: <ms>/<ms>/<ms>
Error rate: <pct>
Primary metric (7d rolling): <val>
Drift status: NONE|LOW|MODERATE|HIGH
Append .godmode/mlops-results.tsv:
timestamp model version action latency_p99 status
KEEP if: metrics improve AND no guardrail regression
AND pipeline runs end-to-end.
DISCARD if: metrics regress OR pipeline fails.
Revert and log reason.
STOP when FIRST of:
- Model stable at 100% for 24h
- Drift monitoring configured
- Rollback tested < 5 min
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| OOM during training | Resume checkpoint, reduce batch |
| Performance degrades | Check drift, trigger retrain |
| A/B no difference | Verify sample size, document null |