Help us improve
Share bugs, ideas, or general feedback.
From amg-toolkit
Fleet-wide Cosmos DB for MongoDB (RU) health check — scans NormalizedRU consumption, service availability, server-side latency, throttling (429s), and replication metrics across all accounts, then deep-dives into abnormal accounts with resource logs and correlation analysis. Tracks known issues across sessions via persistent report. Uses AMG-MCP pulse check for Tier 1 triage, then batched Azure Monitor queries for Tier 2 investigation. On first run, auto-discovers datasource UID and prompts for subscription ID.
npx claudepluginhub azure/amg-skills --plugin amg-toolkitHow this skill is triggered — by the user, by Claude, or both
Slash command
/amg-toolkit:amg-check-cosmosdb-mongo-ruThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Current UTC time: !`date -u +%Y-%m-%dT%H:%M:%SZ`
Investigates Azure Cosmos DB performance issues: RU spikes, query latency, throttling, hot partitions, indexing inefficiency, partition skew, and diagnostic-log review for evidence-driven remediation.
Monitors PostgreSQL, MySQL, MongoDB health via CLI queries for connections, throughput, disk usage, cache ratios, locks, with alerting thresholds and automation guidance.
Azure Cosmos DB performance optimization and best practices for NoSQL, including partitioning, query optimization, SDK usage, and data modeling.
Share bugs, ideas, or general feedback.
date -u +%Y-%m-%dT%H:%M:%SZcat memory/amg-check-cosmosdb-mongo-ru/config.md 2>/dev/null || echo "NOT_CONFIGURED"[ -f memory/amg-check-cosmosdb-mongo-ru/report.md ] && echo "exists ($(grep -c '^### BUG-' memory/amg-check-cosmosdb-mongo-ru/report.md) bugs documented)" || echo "not found"Known Issues: Before presenting findings, cross-reference results against
memory/amg-check-cosmosdb-mongo-ru/report.md.
from/to — NEVER use timespan (it causes errors).PT1H — it works for all Cosmos DB metrics. PT6H is NOT supported. DataUsage, IndexUsage, and DocumentCount do NOT support P1D.node -e "..." if installed; otherwise fall back to python -c "...", jq, or pwsh -Command "...". Bash permission for the chosen interpreter will be prompted on first use.Update checkboxes as you complete each phase:
memory/amg-check-cosmosdb-mongo-ru/report.mdIf Config shows NOT_CONFIGURED: Run First-Run Setup at the bottom of this file, then return here.
If Config is populated: Extract the datasource UID and subscription ID from the pre-loaded Runtime Context above and use them for all queries. Use $1 as the subscription override if provided.
## Azure Monitor Datasource > UID## Subscription (or $1 if provided)microsoft.documentdb/databaseaccounts (lowercase) with kind == 'MongoDB'Default: 7 days for metrics, 24 hours for logs. Override with $0 (e.g., 3d). Keep log queries to 1-2 days to avoid timeouts.
Call amgmcp_datasource_list (no parameters). Find entry with type == "grafana-azure-monitor-datasource".
memory/amg-check-cosmosdb-mongo-ru/config.md, warn user, use new UID.azureMonitorDatasourceUid: {DATASOURCE_UID}
query: |
resources
| where type == 'microsoft.documentdb/databaseaccounts'
| where kind == 'MongoDB'
| project name, resourceGroup, location, subscriptionId, id, properties.provisioningState
| order by location asc, name asc
If the config specifies subscription IDs (not "all"), add | where subscriptionId in ('{ID1}', '{ID2}'). Derive region summary by counting accounts per location. Flag accounts not in "Succeeded" state. Stop if zero accounts found.
Why
kind == 'MongoDB'? Filters for RU-based MongoDB API accounts. vCore-based MongoDB usesmicrosoft.documentdb/mongoclusters.
If any accounts are not in "Succeeded" state, query the activity log for up to 3 of them:
azureMonitorDatasourceUid: {DATASOURCE_UID}
scope: {account's full ARM resource ID}
startTime: now-3d
endTime: now
select: eventTimestamp,operationName,status,caller,subStatus
If the response exceeds 500 KB, retry with startTime: now-1d. Summarize: operations performed, caller type, success/in-progress status, likely cause.
Call amgmcp_query_resource_metric_definition on the first account from Phase 1. Confirm expected metrics exist. Run only once — definitions are the same across all accounts.
azureMonitorDatasourceUid: {DATASOURCE_UID}
pastDays: 7
scenarios: cosmosdb_mongo
Scans all accounts across 3 scenarios: cosmosdb_mongo_ru, cosmosdb_mongo_throttling, cosmosdb_mongo_availability.
Before moving to Phase 4, verify:
scanSummary.totalResourcesScanned matches Phase 1 account count.status: "completed" in scenarioResults.errors non-empty, retry affected scenarios individually.amgmcp_query_resource_metric for unscanned accounts.Accounts in the findings array are abnormal. Also flag any non-Succeeded accounts from Phase 1.
Note: Sustained-high detection (>50% for 6+ hours), RU spike pattern detection (>30pp jump in 1h), and latency analysis require hourly time-series data and are performed in Phase 4 on flagged accounts only.
Read reference/phase4-deep-metrics.md before starting Phase 4. It contains:
Read reference/phase5-resource-logs.md before starting Phase 5. It contains:
Present the report using the structure in reference/output-format.md.
Classification:
| Severity | Criteria |
|---|---|
| CRITICAL | NormalizedRU = 100% sustained, OR ServiceAvailability < 99.9%, OR latency avg > 50ms |
| HIGH | NormalizedRU max 85-100% with frequent spikes, OR ReplicationLatency > 1000ms |
| WARNING | NormalizedRU max 70-85% sustained, OR sustained RU > 50% for 6h+, OR RU spike >30pp in 1h, OR ServiceAvailability < 99.99%, OR latency avg > 10ms, OR ReplicationLatency > 100ms |
| MODERATE | NormalizedRU max 50-70% |
| HEALTHY | All metrics within normal ranges (NormalizedRU < 50%) |
After presenting findings, update memory/amg-check-cosmosdb-mongo-ru/report.md:
Only add genuine issues: sustained throttling, availability drops, high latency patterns, or replication problems. Skip transient single-hour spikes or expected maintenance windows.
See reference/error-handling.md for the full recovery table.
microsoft.documentdb/databaseaccounts (kind: MongoDB)microsoft.documentdb/mongoclustersServerSideLatencyDirect and ServerSideLatencyGateway (the old ServerSideLatency is deprecated)CDBMongoRequests (primary), CDBDataPlaneRequests (fallback)429 / 16500 (throttling), 50 (server error), 13 (unauthorized)PT1H for all metrics (PT6H NOT supported)memory/amg-check-cosmosdb-mongo-ru/report.mdmemory/amg-check-cosmosdb-mongo-ru/config.mdRun only when Config shows NOT_CONFIGURED. After completing, return to the Workflow above.
1. Discover Datasource UID: Call amgmcp_datasource_list. Filter type == "grafana-azure-monitor-datasource". Prefer uid == "azure-monitor-oob" if multiple match. Abort if zero match.
2. Discover Subscription ID: Run this Resource Graph query to list all subscriptions with Cosmos DB for MongoDB (RU) accounts, then present the results as a table and ask the user which subscription(s) to use:
resources
| where type == 'microsoft.documentdb/databaseaccounts'
| where kind == 'MongoDB'
| join kind=inner (
resourcecontainers
| where type == 'microsoft.resources/subscriptions'
| project subscriptionId, subscriptionName=name
) on subscriptionId
| summarize AccountCount=count() by subscriptionId, subscriptionName
| order by AccountCount desc
Present the results as a table with columns: Subscription Name, Subscription ID, Account Count. Then ask the user: "Which subscription ID(s) should I configure for this health check? Or type 'all' to scan all subscriptions."
3. Write config: Write memory/amg-check-cosmosdb-mongo-ru/config.md:
# amg-check-cosmosdb-mongo-ru Configuration
User-specific values for the Cosmos DB for MongoDB (RU) health check skill.
This file is auto-generated on first run and can be edited manually.
## Azure Monitor Datasource
- **UID**: {discovered_uid}
- **Name**: {discovered_name}
## Subscription
- {subscription_id_or_"all"}
4. Confirm: Show the resolved config and ask for confirmation before proceeding.