From cockroachdb
Diagnoses live CockroachDB cluster performance issues by identifying long-running queries, busy sessions, and active transactions using SQL-only interfaces. Use when users report cluster slowness, high CPU, or need to find runaway queries and their source applications without DB Console access.
npx claudepluginhub cockroachdb/claude-plugin --plugin cockroachdbThis skill uses the workspace's default tool permissions.
Diagnoses live cluster performance issues by identifying currently active long-running queries, busy sessions, and active transactions. Uses SQL-only interfaces (SHOW statements and `crdb_internal` views) to provide immediate triage without requiring DB Console, HTTP endpoints, or Prometheus access.
Fetches query-level metrics from AlloyDB Postgres via PromQL in Google Cloud Monitoring to troubleshoot slow performance, query plans, and resource-heavy processes.
Monitors transactions in PostgreSQL, MySQL, MongoDB for long-running queries, lock contention, idle sessions, and throughput using CLI queries and scripts.
Fetches query-level metrics from Cloud SQL Postgres instances using PromQL to troubleshoot performance bottlenecks, analyze execution plans, and monitor resource usage.
Share bugs, ideas, or general feedback.
Diagnoses live cluster performance issues by identifying currently active long-running queries, busy sessions, and active transactions. Uses SQL-only interfaces (SHOW statements and crdb_internal views) to provide immediate triage without requiring DB Console, HTTP endpoints, or Prometheus access.
For historical performance analysis: Use profiling-statement-fingerprints to analyze query patterns over time, identify slow fingerprints, and investigate trends without needing live queries. For transaction-level analysis: Use profiling-transaction-fingerprints to analyze historical transaction retry patterns, commit latency trends, and statement composition. For background job monitoring: Use monitoring-background-jobs to monitor schema changes, backups, and automatic jobs that don't appear in SHOW CLUSTER STATEMENTS.
Required SQL access:
VIEWACTIVITY or VIEWACTIVITYREDACTED privilege
VIEWACTIVITYREDACTED: Redacts constants in other users' queries (recommended for privacy)VIEWACTIVITY: Shows full query text for all usersCANCELQUERY / CANCELSESSION privileges for cancellation operationsCheck your privileges:
SHOW GRANTS ON ROLE <username>;
See permissions reference for detailed RBAC setup.
CockroachDB provides SQL-only interfaces for live activity triage:
| Interface | Purpose | Cluster-wide? |
|---|---|---|
SHOW CLUSTER STATEMENTS | Currently executing queries | Yes (with VIEWACTIVITY) |
SHOW CLUSTER SESSIONS | Active client sessions | Yes (with VIEWACTIVITY) |
crdb_internal.cluster_transactions | In-progress transactions | Yes (with VIEWACTIVITY) |
Triage workflow:
Safety: All diagnostic queries are read-only. Cancellation is opt-in with explicit warnings.
Identify queries running longer than a specified threshold:
-- Queries running longer than 5 minutes
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT
query_id,
node_id,
session_id,
user_name,
client_address,
application_name,
start,
now() - start AS running_for,
substring(query, 1, 200) AS query_preview,
distributed,
phase
FROM q
WHERE start < now() - INTERVAL '5 minutes'
ORDER BY start
LIMIT 50;
Key columns:
running_for: How long the query has been executingquery_preview: First 200 characters (protects against massive queries)phase: execution phase (preparing, executing, etc.)distributed: whether query spans multiple nodesCustomizable thresholds:
INTERVAL '5 minutes' to '10 minutes', '30 seconds', etc.LIMIT based on cluster size and expected loadFind sessions with long-running active queries:
-- Sessions with active queries running > 5 minutes
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT
node_id,
session_id,
user_name,
client_address,
application_name,
status,
active_query_start,
now() - active_query_start AS active_query_for,
substring(active_queries, 1, 200) AS active_queries_preview,
substring(last_active_query, 1, 200) AS last_query_preview
FROM s
WHERE active_query_start IS NOT NULL
AND active_query_start < now() - INTERVAL '5 minutes'
ORDER BY active_query_start
LIMIT 50;
Key columns:
active_query_for: Duration of current active queryapplication_name: Source application for drill-downclient_address: Client IP/hostname for troubleshootingstatus: Session state (Idle, Active, etc.)Identify long-running transactions (potential blockers):
-- Transactions running > 5 minutes
SELECT
id AS txn_id,
node_id,
session_id,
application_name,
start,
now() - start AS running_for,
num_stmts,
num_retries,
num_auto_retries,
substring(txn_string, 1, 200) AS txn_string_preview
FROM crdb_internal.cluster_transactions
WHERE start < now() - INTERVAL '5 minutes'
ORDER BY start
LIMIT 50;
Key columns:
num_retries / num_auto_retries: High retry counts indicate contentionnum_stmts: Number of statements in transaction (large = potentially problematic)txn_string: Transaction fingerprintProduction safety note: crdb_internal.cluster_transactions is production-approved and safe for triage.
Once you identify suspicious activity, drill down by filtering:
-- All activity from specific application
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT query_id, user_name, start, now() - start AS running_for,
substring(query, 1, 200) AS query_preview
FROM q
WHERE application_name = 'payments-api'
ORDER BY start;
-- All activity from specific user
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT session_id, application_name, client_address,
active_query_start, substring(active_queries, 1, 200) AS active_queries_preview
FROM s
WHERE user_name = 'app_user'
AND active_query_start IS NOT NULL
ORDER BY active_query_start;
-- All sessions from specific client IP
WITH s AS (SHOW CLUSTER SESSIONS)
SELECT session_id, user_name, application_name,
status, substring(active_queries, 1, 200) AS active_queries_preview
FROM s
WHERE client_address LIKE '10.0.1.%'
ORDER BY active_query_start;
-- Long queries from specific app and user
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT query_id, node_id, start, now() - start AS running_for,
substring(query, 1, 200) AS query_preview
FROM q
WHERE application_name = 'payments-api'
AND user_name = 'app_user'
AND start < now() - INTERVAL '10 minutes'
ORDER BY start;
Read-only operations:
All diagnostic queries (SHOW statements, crdb_internal.cluster_transactions) are read-only and safe to run in production.
Cancellation operations (opt-in):
CAUTION: Canceling queries/sessions terminates user work
Only proceed if:
CANCELQUERY or CANCELSESSION privileges-- 1. Identify the query_id from triage queries above
-- 2. Cancel it
CANCEL QUERY '<query_id>';
Example:
CANCEL QUERY '15f9e0e91f072f0f0000000000000001';
-- 1. Identify the session_id from triage queries above
-- 2. Cancel all queries in that session
CANCEL SESSION '<session_id>';
Example:
CANCEL SESSION '15f9e0e91f072f0f';
Verification: After canceling, re-run the triage queries to confirm the query/session is gone.
Required privileges:
CANCELQUERY system privilege to cancel queriesCANCELSESSION system privilege to cancel sessionsSee permissions reference for granting these privileges.
Scenario: Users report general slowness.
Check for long-running queries:
-- Run the "Long-Running Queries" diagnostic
-- Look for queries running > 5-10 minutes
Identify source applications:
-- Group by application to find culprits
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT application_name, COUNT(*) AS num_queries,
AVG(now() - start) AS avg_duration
FROM q
WHERE start < now() - INTERVAL '5 minutes'
GROUP BY application_name
ORDER BY num_queries DESC;
Drill down into specific app:
-- Filter by top application from step 2
-- Use "Filter by Application" query
Decide on action:
Scenario: Suspect contention issues.
Check for high retry counts:
SELECT application_name, AVG(num_retries) AS avg_retries,
MAX(num_retries) AS max_retries, COUNT(*) AS num_txns
FROM crdb_internal.cluster_transactions
WHERE start < now() - INTERVAL '5 minutes'
GROUP BY application_name
HAVING AVG(num_retries) > 5
ORDER BY avg_retries DESC;
Investigate specific transactions:
-- Find transactions with >10 retries
SELECT id, application_name, num_retries, num_stmts,
substring(txn_string, 1, 200) AS txn_preview
FROM crdb_internal.cluster_transactions
WHERE num_retries > 10
ORDER BY num_retries DESC;
Next steps:
Scenario: Need to attribute load to specific users.
Count active queries per user:
WITH q AS (SHOW CLUSTER STATEMENTS)
SELECT user_name, COUNT(*) AS num_active_queries,
AVG(now() - start) AS avg_duration
FROM q
GROUP BY user_name
ORDER BY num_active_queries DESC;
Drill down to specific user's activity:
-- Use "Filter by User" query
Take action:
| Issue | Cause | Fix |
|---|---|---|
SHOW CLUSTER STATEMENTS returns empty | No active queries, or insufficient privileges | Grant VIEWACTIVITY or VIEWACTIVITYREDACTED; verify cluster has active load |
Query text shows <hidden> | Using VIEWACTIVITYREDACTED privilege | This is expected for privacy; use VIEWACTIVITY if full text needed |
| Can't cancel query: "permission denied" | Missing CANCELQUERY privilege | Grant CANCELQUERY system privilege to your user |
crdb_internal.cluster_transactions slow | High transaction volume on cluster | Add filters (application_name, time threshold) to reduce result set |
| "relation does not exist" error | Typo in table name or old CockroachDB version | Verify you're using production-approved tables; check CockroachDB version compatibility |
| Triage queries themselves are slow | Cluster under extreme load | Use more aggressive filters (shorter time window, specific apps); consider canceling obvious runaway work first |
VIEWACTIVITYREDACTED instead of VIEWACTIVITY to protect sensitive query constants in multi-tenant environmentsLIMIT to prevent overwhelming output on large clustersINTERVAL based on your workload (5 minutes is a reasonable default, but fast OLTP may need 30 seconds)SHOW CLUSTER STATEMENTS, SHOW CLUSTER SESSIONS, and crdb_internal.cluster_transactions for production triageSkill references:
Related skills:
Official CockroachDB Documentation: