Monitors PostgreSQL, MySQL, MongoDB health via CLI queries for connections, throughput, disk usage, cache ratios, locks, with alerting thresholds and automation guidance.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin database-health-monitorThis skill is limited to using the following tools:
Monitor database server health across PostgreSQL, MySQL, and MongoDB by tracking key performance indicators including connection utilization, query throughput, replication lag, disk usage, cache hit ratios, vacuum activity, and lock contention.
Monitor database performance and health using PostgreSQL and MySQL queries for connections, query analysis, tables, indexes, and alerting. Use for metrics analysis and troubleshooting.
Monitors transactions in PostgreSQL, MySQL, MongoDB for long-running queries, lock contention, idle sessions, and throughput using CLI queries and scripts.
Sets up Grafana Cloud Database Observability for MySQL and PostgreSQL using Grafana Alloy. Provides query-level insights with RED metrics, query samples, explain plans via pg_stat_statements and Performance Schema, plus trace correlation. Use for slow query diagnosis and performance monitoring.
Share bugs, ideas, or general feedback.
Monitor database server health across PostgreSQL, MySQL, and MongoDB by tracking key performance indicators including connection utilization, query throughput, replication lag, disk usage, cache hit ratios, vacuum activity, and lock contention.
pg_stat_*, performance_schema, serverStatus)psql, mysql, or mongosh CLI tools for running health check queriespg_monitor role (PostgreSQL), PROCESS privilege (MySQL)Check connection utilization:
SELECT count(*) AS active_connections, (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections, round(count(*)::numeric / (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') * 100, 1) AS utilization_pct FROM pg_stat_activitySELECT VARIABLE_VALUE AS connections FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Threads_connected'Monitor query throughput and error rate:
SELECT datname, xact_commit AS commits_total, xact_rollback AS rollbacks_total, xact_rollback::float / GREATEST(xact_commit, 1) AS rollback_ratio FROM pg_stat_database WHERE datname = current_database()SHOW GLOBAL STATUS LIKE 'Com_commit' and SHOW GLOBAL STATUS LIKE 'Com_rollback'Check disk usage and growth:
SELECT pg_size_pretty(pg_database_size(current_database())) AS db_size and SELECT tablename, pg_size_pretty(pg_total_relation_size(tablename::text)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(tablename::text) DESC LIMIT 10Monitor cache hit ratio:
SELECT sum(heap_blks_hit)::float / GREATEST(sum(heap_blks_hit) + sum(heap_blks_read), 1) AS cache_hit_ratio FROM pg_statio_user_tablesSELECT (1 - (VARIABLE_VALUE / (SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests'))) AS hit_ratio FROM performance_schema.global_status WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads'Check vacuum and autovacuum health (PostgreSQL):
SELECT relname, last_vacuum, last_autovacuum, n_dead_tup, n_live_tup, round(n_dead_tup::numeric / GREATEST(n_live_tup, 1) * 100, 1) AS dead_pct FROM pg_stat_user_tables WHERE n_dead_tup > 1000 ORDER BY n_dead_tup DESC LIMIT 10Monitor replication lag (if replicas exist):
SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replicationSHOW REPLICA STATUS\G - check Seconds_Behind_SourceCheck for long-running queries:
SELECT pid, now() - query_start AS duration, state, query FROM pg_stat_activity WHERE state != 'idle' AND now() - query_start > interval '5 minutes' ORDER BY duration DESCMonitor lock contention:
SELECT count(*) AS waiting_queries FROM pg_stat_activity WHERE wait_event_type = 'Lock'Compile all health checks into a single monitoring script that runs via cron every 60 seconds, outputs metrics in a structured format (JSON), and triggers alerts when thresholds are breached.
Create a health summary dashboard query that returns a single-row result with RAG (Red/Amber/Green) status for each health dimension: connections, throughput, disk, cache, vacuum, replication, queries, and locks.
| Error | Cause | Solution |
|---|---|---|
pg_stat_activity returns incomplete data | track_activities = off in postgresql.conf | Enable track_activities = on and track_counts = on; reload configuration |
| Health check query itself times out | Database under heavy load or lock contention | Set statement_timeout = '5s' for monitoring queries; use a dedicated monitoring connection |
| False alerts during maintenance windows | Planned maintenance triggers threshold breaches | Implement alert suppression windows; add maintenance mode flag to monitoring script |
| Disk usage alert but no obvious growth | WAL files, temporary files, or pg_stat_tmp consuming space | Check pg_wal directory size; check for orphaned temporary files; verify wal_keep_size setting |
| Cache hit ratio drops after restart | Buffer pool/shared_buffers cold after database restart | Implement cache warming script that runs key queries after restart; alert will self-resolve as cache warms |
PostgreSQL health dashboard for a production SaaS application: A single cron-based script checks 8 health dimensions every 60 seconds, writing results to a metrics table. A dashboard query shows: connections 45/200 (GREEN), cache hit 98.5% (GREEN), dead tuples 2.1% (GREEN), disk 62% (GREEN), replication lag 0.5s (GREEN), long queries 0 (GREEN), lock waiters 1 (GREEN), rollback ratio 0.3% (GREEN).
Detecting impending disk full condition: Health monitor tracks daily disk growth rate. Current usage: 72%, daily growth: 1.2GB, remaining: 280GB. Projected full date: 233 days. Alert triggers at 80% with recommendation to archive old data or add storage. A second alert at 90% escalates to PagerDuty.
Identifying autovacuum falling behind: Health check shows the events table with 15M dead tuples (45% dead ratio) and last autovacuum 3 days ago. Root cause: autovacuum_vacuum_cost_delay too conservative for a high-write table. Fix: set per-table autovacuum_vacuum_cost_delay = 2 and autovacuum_vacuum_scale_factor = 0.01.