From cockroachdb
Ranks and analyzes statement fingerprints using aggregated SQL statistics from crdb_internal.statement_statistics to identify slow, resource-intensive, or error-prone query patterns. Use when investigating historical performance trends, identifying optimization opportunities, or diagnosing recurring slowness without DB Console access.
npx claudepluginhub cockroachdb/claude-plugin --plugin cockroachdbThis skill uses the workspace's default tool permissions.
Analyzes historical statement performance patterns using aggregated SQL statistics to identify slow, resource-intensive, or error-prone query fingerprints. Uses `crdb_internal.statement_statistics` for time-windowed analysis of latency, CPU, contention, admission delays, and failure rates - entirely via SQL without requiring DB Console access.
Fetches query-level metrics from AlloyDB Postgres via PromQL in Google Cloud Monitoring to troubleshoot slow performance, query plans, and resource-heavy processes.
Audits Cloud SQL Postgres health: identifies storage bloat, invalid indexes, table statistics, autovacuum configs. Fetches query metrics via PromQL from Cloud Monitoring.
Analyzes slow queries in PostgreSQL, MySQL, MongoDB using EXPLAIN plans, buffer stats, index usage to detect sequential scans, missing indexes, and recommend optimizations.
Share bugs, ideas, or general feedback.
Analyzes historical statement performance patterns using aggregated SQL statistics to identify slow, resource-intensive, or error-prone query fingerprints. Uses crdb_internal.statement_statistics for time-windowed analysis of latency, CPU, contention, admission delays, and failure rates - entirely via SQL without requiring DB Console access.
Complement to triaging-live-sql-activity: This skill analyzes historical patterns; for immediate triage of currently running queries, see triaging-live-sql-activity.
For immediate incident response: Use triaging-live-sql-activity to triage currently running queries and cancel runaway work. For transaction-level analysis: Use profiling-transaction-fingerprints to analyze retry patterns, commit latency, and statement composition at the transaction boundary. For background job monitoring: Use monitoring-background-jobs for long-running schema changes and automatic jobs excluded from statement statistics.
VIEWACTIVITY or VIEWACTIVITYREDACTED cluster privilege for cluster-wide visibilitysql.stats.automatic_collection.enabled = trueCheck collection status:
SHOW CLUSTER SETTING sql.stats.automatic_collection.enabled; -- Should return: true
See triaging-live-sql-activity permissions reference for RBAC setup (same privileges).
Statement fingerprint: Normalized SQL pattern with parameterized constants (e.g., SELECT * FROM users WHERE id = $1 vs SELECT * FROM users WHERE id = 123)
Key differences:
aggregated_ts: Hourly UTC buckets (e.g., 2026-02-21 14:00:00 = 14:00-14:59 executions)
Data retention: Default ~7 days (check sql.stats.persisted_rows.max)
Best practice: Always filter by time window: WHERE aggregated_ts > now() - INTERVAL '24 hours'
| Metric Category | JSON Path | Scope | Use Case |
|---|---|---|---|
| Aggregated | statistics.statistics.* | All executions | Latency, row counts, execution counts |
| Sampled | statistics.execution_statistics.* | ~10% sample | CPU, contention, admission wait, memory/disk |
Critical: Always check sampled metrics presence: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL
Operators:
->: Extract JSON object (returns JSON)->>: Extract as text (returns text)::TYPE: Cast to specific typeExamples:
metadata->>'db' -- Database name
(statistics->'statistics'->>'cnt')::INT -- Execution count
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 -- Mean latency (seconds)
(statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9 -- CPU (convert nanos to seconds)
Units: Latency = seconds, CPU/admission = nanoseconds (÷ 1e9), Memory/disk = bytes (÷ 1048576 for MB)
See JSON field reference for complete schema.
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS mean_run_lat_seconds,
(statistics->'statistics'->'runLat'->>'max')::FLOAT8 AS max_run_lat_seconds,
(metadata->>'fullScan')::BOOL AS full_scan,
metadata->'index_recommendations' AS index_recommendations,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->'runLat'->>'mean')::FLOAT8 > 1.0 -- > 1 second mean latency
ORDER BY (statistics->'statistics'->'runLat'->>'mean')::FLOAT8 DESC
LIMIT 20;
Focus: Slowest queries; check full_scan and index_recommendations for optimization opportunities.
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 / 1e9 AS mean_admission_wait_seconds,
(statistics->'statistics'->'runLat'->>'mean')::FLOAT8 AS mean_run_lat_seconds,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 > 0
ORDER BY (statistics->'execution_statistics'->'admissionWaitTime'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: High admission wait = cluster at resource limits (CPU, memory, I/O). Ratio > 1.0 (wait > runtime) indicates severe queueing.
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'query' AS query_text,
COUNT(DISTINCT plan_hash) AS distinct_plan_count,
array_agg(DISTINCT plan_hash ORDER BY plan_hash) AS plan_hashes,
SUM((statistics->'statistics'->>'cnt')::INT) AS total_executions
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '7 days'
GROUP BY fingerprint_id, metadata->>'db', metadata->>'query'
HAVING COUNT(DISTINCT plan_hash) > 1
ORDER BY COUNT(DISTINCT plan_hash) DESC, SUM((statistics->'statistics'->>'cnt')::INT) DESC
LIMIT 20;
Interpretation: Multiple plans indicate instability from schema changes, statistics updates, or routing changes. Performance can vary significantly between plans.
SELECT
fingerprint_id,
metadata->>'db' AS database,
metadata->>'app' AS application,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'statistics'->>'cnt')::INT AS execution_count,
(statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9 AS mean_contention_seconds,
ROUND(
((statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 / 1e9) /
NULLIF((statistics->'statistics'->'runLat'->>'mean')::FLOAT8, 0) * 100, 2
) AS contention_pct_of_runtime,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 > 0
ORDER BY (statistics->'execution_statistics'->'contentionTime'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: >20% contention = transaction conflicts, hot row access. Remediate with batching, transaction boundary changes, or schema redesign.
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9 AS mean_cpu_seconds,
(statistics->'statistics'->>'cnt')::INT AS total_executions,
ROUND(
((statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 / 1e9) *
(statistics->'statistics'->>'cnt')::INT, 2
) AS estimated_total_cpu_seconds,
(metadata->>'fullScan')::BOOL AS full_scan,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'cpuSQLNanos'->>'mean')::FLOAT8 > 0
ORDER BY estimated_total_cpu_seconds DESC
LIMIT 20;
Focus: estimated_total_cpu_seconds shows cluster impact. High mean CPU often correlates with full_scan = true.
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'execution_statistics'->'maxMemUsage'->>'mean')::FLOAT8 / 1048576 AS mean_mem_mb,
(statistics->'execution_statistics'->'maxMemUsage'->>'max')::FLOAT8 / 1048576 AS max_mem_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 / 1048576 AS mean_disk_mb,
(statistics->'execution_statistics'->'maxDiskUsage'->>'max')::FLOAT8 / 1048576 AS max_disk_mb,
metadata->>'stmtType' AS statement_type,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'execution_statistics'->>'cnt') IS NOT NULL
AND (statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 > 0 -- Has disk spills
ORDER BY (statistics->'execution_statistics'->'maxDiskUsage'->>'mean')::FLOAT8 DESC
LIMIT 20;
Interpretation: Disk usage > 0 = memory spill (~100-1000x slower than in-memory). Common for large aggregations, sorts, hash joins. Fix with indexes or increased sql.distsql.temp_storage.workmem.
SELECT
fingerprint_id,
metadata->>'db' AS database,
substring(metadata->>'query', 1, 150) AS query_preview,
(statistics->'statistics'->>'cnt')::INT AS total_executions,
COALESCE((statistics->'statistics'->>'failureCount')::INT, 0) AS failure_count,
ROUND(
COALESCE((statistics->'statistics'->>'failureCount')::INT, 0)::NUMERIC /
NULLIF((statistics->'statistics'->>'cnt')::INT, 0) * 100, 2
) AS failure_rate_pct,
aggregated_ts
FROM crdb_internal.statement_statistics
WHERE aggregated_ts > now() - INTERVAL '24 hours'
AND (statistics->'statistics'->>'cnt')::INT > 10
AND COALESCE((statistics->'statistics'->>'failureCount')::INT, 0) > 0
ORDER BY failure_rate_pct DESC, failure_count DESC
LIMIT 20;
Common causes: Constraint violations, query timeouts, transaction retry errors (40001), permission denied.
mean_run_lat_seconds > 5 and high execution countsfull_scan = true, review index_recommendationsmetadata->>'app', contact teams with specific patternscontention_pct_of_runtime > 20%SELECT FOR UPDATE, partition hot tables, denormalize schemaaggregated_ts to find peak periodsmax_disk_mb > 100GROUP BY, ORDER BY, hash joinsRead-only operations: All queries are SELECT statements against production-approved crdb_internal.statement_statistics.
Performance impact:
| Consideration | Impact | Mitigation |
|---|---|---|
| Large table | Many rows with high statement diversity | Always use time filters and LIMIT |
| JSON parsing | CPU overhead | Use specific time windows, avoid tight loops |
| Broad windows | 7-day queries = more rows | Default to 24h; expand only when needed |
Privacy: Use VIEWACTIVITYREDACTED to redact query constants in multi-tenant environments.
| Issue | Cause | Fix |
|---|---|---|
| Empty results | No data or stats collection disabled | Check sql.stats.automatic_collection.enabled = true |
column does not exist | JSON field typo or version mismatch | Verify field names; check CockroachDB version |
| NULL in sampled metrics | Metric not sampled in bucket | Filter: WHERE (statistics->'execution_statistics'->>'cnt') IS NOT NULL |
Query text shows <hidden> | Using VIEWACTIVITYREDACTED | Expected; use VIEWACTIVITY if authorized |
| "invalid input syntax for type json" | Malformed JSON path | Check operators: -> for JSON, ->> for text |
| Very slow query | Large table, no time filter | Always add time window and LIMIT |
Empty index_recommendations | No recommendations or optimal | Normal if indexes exist |
cnt)sql.stats.persisted_rows.maxSkill references:
Official CockroachDB Documentation:
Related skills: