Sets up free production monitoring for SaaS apps: UptimeRobot for uptime, Sentry for errors, platform metrics for performance. Configures alerts and incident response.
npx claudepluginhub whawkinsiv/solo-founder-superpowers --plugin solo-founder-superpowersThis skill uses the workspace's default tool permissions.
**This skill is for production monitoring and incident response.** For debugging specific bugs, use **debug**. For pre-launch readiness checks, use **go-live**. For security-specific monitoring (auth events, API abuse), use **secure**. For analytics and user behavior tracking, use **analytics**.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
This skill is for production monitoring and incident response. For debugging specific bugs, use debug. For pre-launch readiness checks, use go-live. For security-specific monitoring (auth events, API abuse), use secure. For analytics and user behavior tracking, use analytics.
Basic Monitoring:
- [ ] Uptime monitoring (is site up?)
- [ ] Error tracking (are errors happening?)
- [ ] Performance monitoring (is it slow?)
- [ ] User activity (are people using it?)
- [ ] Critical alerts configured
- [ ] Check dashboard daily
See MONITORING-SETUP.md for implementation.
Without monitoring:
With monitoring:
Goal: Know about problems before users tell you.
Uptime monitoring - Pings your app every minute
Free tools:
Setup:
1. Sign up for UptimeRobot
2. Add monitor for https://yourapp.com
3. Add your email for alerts
4. Get texted if site is down
Error tracking - Captures JavaScript errors and API failures
Free tools:
Claude Code:
Add Sentry error tracking to my app:
- Install @sentry/nextjs (or appropriate package)
- Capture all frontend errors and API errors
- Include user context (email, ID)
- Configure source maps for readable stack traces
- Set up Sentry.init in both client and server entry points
Lovable / Replit (paste into chat):
Add error tracking to my app. I want to be notified when errors happen.
Use Sentry (free tier). Show me how to:
1. Create a Sentry account and project
2. Add the tracking code to my app
3. Test that errors are being captured
Performance monitoring - Tracks page load times
Free tools:
Setup:
Must monitor:
Nice to have:
For MVP: Focus on the "must monitor" only.
Configure alerts for:
Critical (text me immediately):
Important (email within hour):
Informational (daily digest):
Tell AI:
Configure monitoring alerts:
- Critical: Text to [phone]
- Important: Email to [email]
- Send summary: Daily at 9am
5-minute morning check:
Daily Check:
1. Open monitoring dashboard
2. Check uptime (should be 100% yesterday)
3. Check error count (any spikes?)
4. Check performance (slower than usual?)
5. Review any alerts from overnight
If all green: You're done, 5 minutes.
If red: Investigate using debug skill.
Green: Site responding
Red: Site down or slow to respond
What to check:
Look for:
Priority:
Look for:
When errors spike:
1. Open error tracking dashboard (Sentry)
2. Find the most frequent error
3. Read error message and stack trace
4. Note: How many users affected?
5. Note: Started when?
6. Check: Did we deploy recently?
Give to AI:
Error in production:
[Paste error message and stack trace]
Affected: [X] users in last [Y] hours
Started: [timestamp]
Recent deploys: [any?]
Please:
1. Explain what's wrong
2. Propose hotfix
3. How to test before deploying
When user reports problem:
User Report Investigation:
1. Can you reproduce it?
2. Check monitoring for errors at that time
3. Check logs for that user
4. Check if others affected
5. Determine severity
Then use debug skill to fix.
Tell AI:
User reported: [issue description]
User: [email or ID]
Timestamp: [when it happened]
Check monitoring and logs for this user at this time.
What errors or issues do you see?
Catch issues before users:
Weekly checks:
Weekly Review:
- [ ] Error trends (going up or down?)
- [ ] Performance trends (slower?)
- [ ] New error types introduced
- [ ] Uptime issues resolved
- [ ] Alert noise (too many false alerts?)
Monthly checks:
Monthly Health:
- [ ] Compare to last month
- [ ] Any degradation?
- [ ] Any improvements?
- [ ] Monitoring gaps (what's not tracked?)
Recommended for MVP:
Uptime:
Errors:
Performance:
Logs:
Cost: $0/month until you need more.
Upgrade when:
Paid tiers (typically $20-50/mo):
For < 1000 users: Free tiers sufficient.
| Mistake | Fix |
|---|---|
| No monitoring set up | Set up before launch |
| Alert fatigue (too many alerts) | Only alert on critical issues |
| Checking once a month | Check daily (5 minutes) |
| Ignoring trends | Watch for degradation over time |
| No alerts configured | Set up text alerts for downtime |
| Monitoring but not acting | Use monitoring to find and fix issues |
Good trends:
Warning trends:
Critical trends:
Action: Address warning trends before they become critical.
Logging:
Monitoring:
Both needed: Monitoring alerts you, logs help debug.
Tell AI:
Add application logging:
- Log all errors with context
- Log API requests/responses
- Log slow operations (>1s)
- Log authentication events
- Don't log sensitive data
Format: JSON with timestamp, level, message, context
Send to: [Platform logs or external service]
Log levels:
Production: Log ERROR and WARN only.
Third-party services:
Payments (Stripe):
Email (SendGrid):
Database:
Tell AI:
Add monitoring for [service]:
- Alert on failures
- Track success rate
- Log errors with context
When alerts fire:
Incident Response:
1. Acknowledge alert (mark as seen)
2. Assess severity:
- Critical: Site down, payments failing
- High: Errors affecting many users
- Medium: Isolated issues
3. Immediate action:
- Critical: Hotfix or rollback
- High: Fix within hours
- Medium: Fix in next deploy
4. Update users if needed
5. Post-mortem after resolved
Critical incidents:
1. Assess impact (how many affected?)
2. Quick fix or rollback
3. Deploy hotfix
4. Verify fixed
5. Monitor closely for hour
6. Update status page if you have one
✅ Know about issues before users report them ✅ Uptime >99.9% ✅ Errors caught and fixed quickly ✅ Performance trends stable or improving ✅ Daily monitoring routine (5 minutes) ✅ Alerts configured and actionable ✅ Issues resolved proactively