nw-production-readiness
Monitoring, observability, operational procedures, CI/CD lessons learned, and quality gate definitions. Load when assessing production readiness or validating operational excellence.
From nwnpx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
Production Readiness
Monitoring and Observability
Application Monitoring
- Performance: response time | throughput | latency percentiles (P50, P95, P99)
- Resources: CPU | memory | database connections | cache hit rates
- Errors: exception tracking | error rate trends | integration failure detection
- Business: KPI tracking | conversion funnels | feature usage | revenue impact
Infrastructure Monitoring
Server/container health and resource utilization | Network performance and connectivity | Storage capacity and I/O performance | Security event detection.
Alerting Tiers
| Tier | Condition | Response |
|---|---|---|
| Page | Service down, data loss risk, security breach | Immediate response |
| Urgent | Error rate >2x baseline, latency SLA breach | Response within 15 min |
| Warning | Capacity >80%, error rate trending up | Response within 1 hour |
| Info | Deployment complete, metric threshold crossed | Review next business day |
Operational Procedures
Incident Response
- Detect: automated alerting identifies issue
- Triage: classify severity, assign responder
- Communicate: notify stakeholders per severity level
- Resolve: apply fix or rollback
- Review: post-incident review within 48 hours
- Improve: update runbooks and monitoring based on findings
Maintenance Procedures
Regular update and patching schedule | Backup verification (test restores quarterly) | Security vulnerability scanning (automated, weekly) | Performance baseline recalibration (after major changes).
Knowledge Transfer
Operational runbooks for common procedures | Architecture documentation with system diagrams | Deployment procedures and configuration management | Troubleshooting guides for known failure modes.
Quality Gates for Production Readiness
Before declaring production-ready, all must pass:
- All acceptance tests passing
- Unit coverage meets project standard (default: >= 80%)
- Integration tests validated
- Performance validated under realistic load
- Security scan completed (0 critical, 0 high)
- Monitoring and alerting configured
- Logging structured and searchable
- Rollback procedure documented and tested
- Runbook created for operational procedures
- On-call team trained on new feature
For CI/CD architecture lessons and measurement coupling pitfalls, see cicd-and-deployment skill.