From vmkteam-developer
Accesses Prometheus metrics for vmkteam services (appkit HTTP/RPC, zenrpc, cron) via PromQL queries using pcurl. Includes metric tables, labels, and example commands.
npx claudepluginhub vmkteam/claude-plugins --plugin vmkteam-developerThis skill uses the workspace's default tool permissions.
Prometheus для метрик. Метрики из vmkteam/appkit, vmkteam/zenrpc, vmkteam/cron.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Prometheus для метрик. Метрики из vmkteam/appkit, vmkteam/zenrpc, vmkteam/cron.
Конкретные хосты, job-имена определяются при онбординге.
Profile: @{prom_profile}
Base URL: https://{prom_host}/api/v1
Также доступен через Grafana proxy (см. /grafana).
| Метрика | Тип | Labels |
|---|---|---|
app_http_requests_total | counter | job, code, method, uri, server |
app_http_responses_duration_seconds_count | counter | job, code, method, uri, server |
app_http_responses_duration_seconds_sum | counter | job, code, method, uri, server |
app_http_client_requests_total | counter | job |
app_http_client_requests_inflight | gauge | job |
| Метрика | Тип | Labels |
|---|---|---|
app_rpc_responses_duration_seconds_count | counter | job, code, method, platform, server, version |
app_rpc_responses_duration_seconds_sum | counter | job, code, method, platform, server, version |
app_rpc_error_requests_total | counter | job, code, method, platform, server, version |
| Метрика | Тип | Labels |
|---|---|---|
app_cron_active | gauge | job |
app_cron_evaluated_total | counter | job |
app_cron_evaluated_duration_seconds_* | counter | job |
| Метрика | Описание |
|---|---|
app_log_events_total | Log events |
app_metadata_db_connections_total | DB connections |
| Label | Описание |
|---|---|
job | Сервис (определяется при онбординге) |
code | HTTP/RPC status code |
method | HTTP method или RPC method name (lowercase) |
uri | HTTP URI |
platform | Клиентская платформа (RPC) |
version | Версия клиента (RPC) |
# Общий формат (--data-urlencode обязателен для query с фигурными скобками)
pcurl @{profile} 'https://{host}/api/v1/query' -s -G --data-urlencode 'query={promql}'
# Health
pcurl @{profile} 'https://{host}/api/v1/query' -s -G --data-urlencode 'query=up{job="{service}"}'
# RPC RPS
pcurl @{profile} 'https://{host}/api/v1/query' -s -G --data-urlencode 'query=sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m]))'
# RPC error rate
pcurl @{profile} 'https://{host}/api/v1/query' -s -G --data-urlencode 'query=sum(rate(app_rpc_error_requests_total{job="{service}"}[5m]))/sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m]))'
pcurl @{profile} 'https://{host}/api/v1/query_range' -s -G \
--data-urlencode 'query={promql}' \
--data-urlencode "start=$(date -v-1H +%s)" \
--data-urlencode "end=$(date +%s)" \
--data-urlencode 'step=60'
pcurl @{profile} 'https://{host}/api/v1/label/job/values' -s
pcurl @{profile} 'https://{host}/api/v1/label/method/values' -s -G --data-urlencode 'match[]=app_rpc_responses_duration_seconds_count{job="{service}"}'
# RPS по методу
sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m])) by (method)
# Error rate
sum(rate(app_rpc_error_requests_total{job="{service}"}[5m])) / sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m]))
# Avg latency по методу
sum(rate(app_rpc_responses_duration_seconds_sum{job="{service}"}[5m])) by (method) / sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m])) by (method)
# Top-10 ошибок
topk(10, sum(rate(app_rpc_error_requests_total{job="{service}"}[5m])) by (method, code))
# Top-10 медленных методов
topk(10, sum(rate(app_rpc_responses_duration_seconds_sum{job="{service}"}[5m])) by (method) / sum(rate(app_rpc_responses_duration_seconds_count{job="{service}"}[5m])) by (method))
sum(rate(app_http_requests_total{job="{service}"}[5m]))
sum(rate(app_http_requests_total{job="{service}",code=~"5.."}[5m]))
go_goroutines{job="{service}"}
process_resident_memory_bytes{job="{service}"}
rate(process_cpu_seconds_total{job="{service}"}[5m])
app_metadata_db_connections_total{job="{service}"}
Jobs: node_exporter, consul_node_exporter.
# CPU usage по нодам
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Memory usage
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes
# Memory available %
node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100
# Disk available
node_filesystem_avail_bytes{mountpoint="/"}
# Disk usage %
100 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100)
# Load average
node_load1
node_load5
node_load15
# Network traffic
rate(node_network_receive_bytes_total{device="eth0"}[5m])
rate(node_network_transmit_bytes_total{device="eth0"}[5m])
# Disk I/O
rate(node_disk_io_time_seconds_total[5m])
# Какие сервисы существуют (version, зависимости)
app_metadata_service
# Межсервисные связи (sync/async/external)
app_metadata_services
# Подключения к БД
app_metadata_db_connections_total
Instant: data.result[].metric + data.result[].value[1] (string!)
Range: data.result[].metric + data.result[].values[][1] (string!)