From home-lab-ops
Validates Prometheus metrics, Grafana dashboards, and the monitoring stack configuration to prevent silent breakage when exporter versions change
npx claudepluginhub infiquetra/infiquetra-claude-plugins --plugin home-lab-opsThis skill uses the workspace's default tool permissions.
The monitoring stack runs on **VM 203** (10.220.1.63) as Docker Compose services:
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Automates semantic versioning and release workflow for Claude Code plugins: bumps versions in package.json, marketplace.json, plugin.json; verifies builds; creates git tags, GitHub releases, changelogs.
The monitoring stack runs on VM 203 (10.220.1.63) as Docker Compose services:
monitoring/
├── prometheus/ # Metrics collection + alerting
├── grafana/ # Dashboards + visualization
├── loki/ # Log aggregation
├── promtail/ # Log shipping (runs on all cluster nodes)
└── exporters/
├── node-exporter # All 6 Proxmox nodes + all VMs
├── pve-exporter # Proxmox VE metrics (runs on monitoring VM)
├── ceph-exporter # Ceph cluster metrics
├── ipmi-exporter # iDRAC hardware metrics
├── pbs-exporter # Proxmox Backup Server metrics
└── unifi-poller # UniFi Dream Machine Pro metrics
# SSH to monitoring VM
ssh ubuntu@10.220.1.63
# Check all containers running
docker compose ps
# Check each exporter is actually scraping
curl -s localhost:9100/metrics | head -5 # node_exporter
curl -s localhost:8082/pve # pve-exporter (check for 200 OK)
curl -s localhost:9283/metrics | head -5 # ceph-exporter
# Prometheus health
curl -s localhost:9090/-/healthy
curl -s localhost:9090/api/v1/targets | python3 -m json.tool | grep -E '"health"|"job"'
# On monitoring VM — verify exporter returns data
curl -s localhost:<exporter_port>/metrics | grep <metric_name>
# Check target health in Prometheus
curl -s 'localhost:9090/api/v1/targets' | python3 -m json.tool | grep -A3 '"job": "<job_name>"'
# Query for a metric
curl -s 'localhost:9090/api/v1/query?query=<metric_name>' | python3 -m json.tool
See references/metric-registry.md for known metric name changes between exporter versions.
In Prometheus, explore available metrics:
curl -s 'localhost:9090/api/v1/label/__name__/values' | python3 -m json.tool | grep <keyword>
curl -s 'localhost:9090/api/v1/rules' | python3 -m json.tool | grep -E '"name"|"health"'
When modifying a Grafana dashboard JSON:
"expr": fields in the dashboard JSONinstance, job, host labels must match what Prometheus records# Extract all metric expressions from a dashboard JSON
cat roles/monitoring/files/grafana/dashboards/<dashboard>.json | \
python3 -c "import json,sys; d=json.load(sys.stdin); \
[print(p.get('expr','')) for panel in d.get('panels',[]) \
for t in panel.get('targets',[]) for p in [t]]"
Docker containers running as non-root can't read files created with mode 0600.
# Check permissions on config files
ls -la /opt/monitoring/prometheus/
# All .yml files should be 0644
sudo chmod 644 /opt/monitoring/prometheus/*.yml
# Verify node_exporter is running on a Proxmox host
ssh root@10.220.1.8 "systemctl status prometheus-node-exporter"
# Should be active/running and listening on port 9100
# Verify from monitoring VM
curl -s 10.220.1.8:9100/metrics | head
Promtail runs on each Proxmox node. Check:
# On a Proxmox node
systemctl status promtail
journalctl -u promtail -n 30
# rsyslog must be forwarding to promtail's port
cat /etc/rsyslog.d/99-promtail.conf
# Should contain: *.* action(type="omfwd" target="localhost" port="1514" protocol="tcp")
When adding a new exporter to the monitoring stack:
roles/monitoring/templates/docker-compose.yml.j2roles/monitoring/templates/ or files/roles/monitoring/files/prometheus/prometheus.ymlvault_<exporter>_api_token)0644 in the role taskdocker compose up -d <new_container> before applying via AnsibleSee references/metric-registry.md for the metric names exposed by each exporter.