From architect
Configure observability stack with metrics, tracing, logging, alerts, and dashboards
npx claudepluginhub navraj007in/architecture-cowork-plugin --plugin architect# /architect:setup-monitoring ## Trigger `/architect:setup-monitoring` — generate the full observability stack (default). `/architect:setup-monitoring [stack:X]` — generate one observability layer only. `/architect:setup-monitoring [stack:X,Y]` — generate a named subset. ### Stack layers | Layer | What it generates | |-------|------------------| | `metrics` | Prometheus/Datadog config, RED + USE method instrumentation | | `tracing` | OpenTelemetry setup, trace exporters, span instrumentation | | `logging` | Structured logging config, log aggregation (Loki/CloudWatch) | | `alerts` | Al...
/setup-monitoringSets up monitoring and alerting for application and infrastructure metrics with runbooks and tests, producing a summary of stack, metrics, and thresholds.
/setup-monitoringDeploys production observability stack with Prometheus metrics, Grafana dashboards, Loki/ELK logs, Jaeger/Tempo traces, and alerting for given infrastructure.
/setup-monitoringDesign and deploy security monitoring infrastructure including SIEM, logging, and alerting.
/architect:setup-monitoring — generate the full observability stack (default).
/architect:setup-monitoring [stack:X] — generate one observability layer only.
/architect:setup-monitoring [stack:X,Y] — generate a named subset.
| Layer | What it generates |
|---|---|
metrics | Prometheus/Datadog config, RED + USE method instrumentation |
tracing | OpenTelemetry setup, trace exporters, span instrumentation |
logging | Structured logging config, log aggregation (Loki/CloudWatch) |
alerts | Alerting rules, notification channels, escalation policies |
slos | SLO definitions, error budgets, burn-rate alerts |
dashboards | Grafana dashboard JSON, golden signals panels |
Examples:
/architect:setup-monitoring [stack:metrics,tracing]
/architect:setup-monitoring [stack:alerts]
/architect:setup-monitoring [stack:slos,dashboards] [non_interactive:true]
When a [stack:...] tag is present, generate only the named layers and skip all others. When absent, generate the full stack.
[non_interactive:true] — skip all questions, derive from SDL and existing project[provider:<name>] — override metrics provider (e.g., [provider:datadog])After /architect:scaffold creates services, they have no observability wired end-to-end. This command generates a production-ready monitoring stack with metrics collection (Prometheus/Datadog/New Relic), distributed tracing (OpenTelemetry), structured logging (Loki/CloudWatch), alerting rules, and Grafana dashboards. Follows observability-skill patterns: RED method, USE method, golden signals, SLO templates, and stage-appropriate alert thresholds.
| Phase | Steps |
|---|---|
| Setup | Step 1 · Step 1.5 |
| Configuration | Step 2 · Step 2.5 |
| Generation | Step 3 · Step 3.5 |
| Completion | Step 4 · Step 4.5 · Step 5 |
ℹ️ CONTEXT LOADING: _state.json → SDL → scaffolded components
First, read architecture-output/_state.json if it exists. Extract:
project.name, project.stage (MVP/growth/enterprise)tech_stack.backend, tech_stack.frontend (frameworks per component)components[] (list of services, types, directories, frameworks)Then, check if a blueprint with SDL exists:
solution.sdl.yaml first; if absent, check sdl/README.md + module filesnonFunctional.observability: section if present (provider, tracing, logging, alerting preferences)integrations.monitoring: section if present (existing dashboards, provider details)Check for scaffolded projects:
<component-name>/ existssrc/lib/ contains metrics.ts, tracing.ts, or logger.ts (might already be partially instrumented by scaffold)If no scaffolded components found, respond:
"I need a scaffolded project to set up monitoring for. Run
/architect:scaffoldfirst, then come back here."
❓ DECISION POINT: Framework detection and provider compatibility
For each component, detect if monitoring is already partially configured:
package.json (Node.js) for installed packages: prometheus-client, @opentelemetry/api, prom-client, datadog, newrelic, winston, pinorequirements.txt (Python) for prometheus-client, opentelemetry-api, dd-trace, newrelic, structloggo.mod (Go) for prometheus/client_golang, go.opentelemetry.io/otel.csproj (C#) for OpenTelemetry.*, Datadog.*If observability is already partially configured (some packages detected):
If no observability packages found, this is a fresh setup → proceed to Step 2
❓ DECISION POINT: Interactive mode questions (skip if [non_interactive:true])
If not in non-interactive mode, ask:
Metrics Provider (default: Prometheus if self-hosted, Datadog if company has account):
"Which metrics provider for RED/USE signals?"
- Prometheus + Grafana (recommended: self-hosted, full control)
- Datadog (recommended: managed, APM included)
- New Relic (managed, strong tracing)
- AWS CloudWatch (if on AWS)
- None (skip metrics for now)
Distributed Tracing (if stage is growth or enterprise):
"Enable distributed tracing for request flows?"
- OpenTelemetry + Jaeger (recommended: vendor-neutral)
- Datadog APM (if Datadog selected above)
- New Relic APM (if New Relic selected above)
- Skip tracing for now
Error & Exception Tracking (optional):
"Wire error tracking?"
- Sentry (recommended: free tier generous)
- Rollbar (production-focused)
- Built-in (use structured logs only)
- None
Log Aggregation (if stage is growth or enterprise):
"Aggregate logs to a central system?"
- Loki + Grafana (recommended: lightweight, part of Prometheus stack)
- ELK Stack (Elasticsearch + Kibana; heavier)
- Datadog Logs (if Datadog selected above)
- CloudWatch Logs (if on AWS)
- Structured logs only (no aggregation)
Alert Severity (default from stage):
"Alert aggressiveness?"
- MVP: Alert on service down only (minimal noise)
- Growth: Alert on errors, latency, resource usage (moderate)
- Enterprise: SLO-based alerts + error budget burn (strict)
If [non_interactive:true], derive answers from SDL:
nonFunctional.observability.metrics.provider → use directlynonFunctional.observability.tracing.enabled → decide OpenTelemetrynonFunctional.observability.errors.provider → use directlynonFunctional.observability.logs.aggregation → use directlyproject.stage → map to alert severity🔄 SKILL LOAD: Read skills/observability/SKILL.md
Before delegating, read skills/observability/SKILL.md in full. This skill is the authoritative guide for:
The monitoring-setup agent will reference this skill for all code and config generation.
🔄 AGENT DELEGATION: Launch monitoring-setup agent (autonomous, config-generating)
Pass the following to the monitoring-setup agent:
Component list from _state.json.components[]:
Monitoring configuration:
Project context:
_state.json.project.stage — MVP/growth/enterprise (affects alert thresholds)_state.json.tech_stack — languages and frameworks per componentnonFunctional.observability sectionReference materials:
skills/observability/SKILL.md — agent will read and followThe agent MUST:
src/lib/metrics.ts (or equivalent) per component with RED/USE signal collectionsrc/lib/tracing.ts with OpenTelemetry initialization and semantic conventionssrc/lib/logger.ts with structured logging setup (Winston/pino/structlog/serilog)monitoring/ directory with config files:
prometheus.yml (if Prometheus selected)grafana/dashboards/ — pre-built RED, USE, golden signals dashboards (JSON)monitoring/alerts/rules.yaml — alert rules per stagedocker-compose.monitoring.yml with Prometheus + Grafana + optional Loki stackmonitoring/runbooks/ — markdown playbooks for each alert (how to respond, debugging steps)monitoring/slo/ — SLO template per component with availability/latency/error targetspackage.json / Makefile / requirements.txt with observability dependenciesThe agent MUST NOT:
src/lib/.env files or credentials✅ QUALITY GATE: Check generated files before proceeding
After the agent completes, verify the monitoring structure:
For each component:
src/lib/metrics.ts (or equivalent) exists and exports RED/USE metricssrc/lib/tracing.ts exists and initializes OpenTelemetry with service namesrc/lib/logger.ts exists with structured log format (JSON with trace_id, user_id, etc.)monitoring/alerts/rules.yaml exists and has alert thresholds matching the selected stagemonitoring/slo/ has at least one SLO templatedocker-compose.monitoring.yml exists if Prometheus was selectedpackage.json (or equivalent) has observability dependencies addedIf verification fails:
Append one line to architecture-output/_activity.jsonl:
{"ts":"<ISO-8601>","phase":"setup-monitoring","outcome":"completed","components":["api-server","web-app"],"provider":"prometheus","tracing":"opentelemetry","alerts":"growth","slos_defined":2,"files_generated":22,"summary":"Monitoring stack configured: Prometheus + Grafana + OpenTelemetry tracing + structured logs. 2 SLOs defined. Alert thresholds: growth stage."}
For each component, also append to <component-name>/_activity.jsonl:
{"ts":"<ISO-8601>","phase":"setup-monitoring","metrics_provider":"prometheus","tracing":"enabled","status":"configured","files_created":["src/lib/metrics.ts","src/lib/tracing.ts","src/lib/logger.ts"],"summary":"Instrumented with RED metrics, OpenTelemetry traces, structured logs (Winston). Trace ID propagation via W3C headers."}
Read existing architecture-output/_state.json (or start with {}).
Merge ONLY the monitoring field:
{
"monitoring": {
"generated_at": "<ISO-8601>",
"metrics_provider": "prometheus",
"tracing": "opentelemetry",
"error_tracking": "sentry",
"log_aggregation": "loki",
"alert_severity": "growth",
"dashboards": {
"red_metrics": "monitoring/grafana/dashboards/red-metrics.json",
"use_metrics": "monitoring/grafana/dashboards/use-metrics.json",
"slo_status": "monitoring/grafana/dashboards/slo-status.json"
},
"alert_rules": "monitoring/alerts/rules.yaml",
"slos": {
"api-server": {
"availability": 0.995,
"latency_p99_ms": 200,
"error_rate": 0.005
}
},
"files_generated": 22
}
}
Write back to architecture-output/_state.json without overwriting other fields.
🚀 COMPLETION MARKER: Emit [SETUP_MONITORING_DONE]
Emit the completion marker:
[SETUP_MONITORING_DONE]
This ensures the monitoring setup phase is marked as complete in the project state.
If no scaffolded components exist:
"I need a scaffolded project to set up monitoring for. Run
/architect:scaffoldfirst, then come back here."
If skills/observability/SKILL.md cannot be read:
If selected provider (Datadog, New Relic) authentication fails:
outcome: "partial"If monitoring/ directory cannot be created due to permissions:
If user selects Prometheus stack but Docker Compose is not installed:
If monitoring configs already exist (prometheus.yml, grafana configs, etc.):
src/lib/ — only generate new monitoring-specific codemonitoring/ directory; code into src/lib/docker-compose.monitoring.yml with Prometheus, Grafana, and optional Loki