Skill

monitoring-observability

From ork

Provides patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing, and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.

Python

monitoring

npx claudepluginhub yonatangross/orchestkit --plugin ork

Tool Access

This skill is limited to using the following tools:

ReadGlobGrepWebFetchWebSearch

Preview

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in `rules/` loaded on-demand.

Supporting Assets

SKILL.md

Similar Skills

langchain-observability

1.9k

Sets up LangSmith tracing, Prometheus metrics callbacks, OpenTelemetry, structured logging, and Grafana dashboards for LangChain apps.

1 file3 tools

langchain-pack

langfuse-observability

1.9k

Sets up Langfuse observability with Prometheus metrics, Grafana dashboards, alerts, and Metrics API for monitoring LLM traces, costs, and latency.

1 file3 tools

langfuse-pack

Langfuse

Provides expertise on Langfuse open-source LLM observability platform for tracing, prompt management, evaluation, datasets, and integrations with LangChain, LlamaIndex, OpenAI. For debugging and monitoring LLM apps.

3 files

omer-metin-skills-for-antigravity-2

Stats

Parent Repo Stars133

Parent Repo Forks14

Last CommitMar 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

Category	Rules	Impact	When to Use
Infrastructure Monitoring	3	CRITICAL	Prometheus metrics, Grafana dashboards, alerting rules
LLM Observability	3	HIGH	Langfuse tracing, cost tracking, evaluation scoring
Drift Detection	3	HIGH	Statistical drift, quality regression, drift alerting
Silent Failures	3	HIGH	Tool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])

# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result

# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

Rule	File	Key Pattern
Prometheus Metrics	`rules/monitoring-prometheus.md`	RED method, counters, histograms, cardinality
Grafana Dashboards	`rules/monitoring-grafana.md`	Golden Signals, SLO/SLI, health checks
Alerting Rules	`rules/monitoring-alerting.md`	Severity levels, grouping, escalation, fatigue prevention

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

Rule	File	Key Pattern
Langfuse Traces	`rules/llm-langfuse-traces.md`	@observe decorator, OTEL spans, agent graphs
Cost Tracking	`rules/llm-cost-tracking.md`	Token usage, spend alerts, Metrics API v2
Eval Scoring	`rules/llm-eval-scoring.md`	Custom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

Rule	File	Key Pattern
Statistical Drift	`rules/drift-statistical.md`	PSI, KS test, KL divergence, EWMA
Quality Drift	`rules/drift-quality.md`	Score regression, baseline comparison, canary prompts
Drift Alerting	`rules/drift-alerting.md`	Dynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

Rule	File	Key Pattern
Tool Skipping	`rules/silent-tool-skipping.md`	Expected vs actual tool calls, Langfuse traces
Quality Degradation	`rules/silent-degraded-quality.md`	Heuristics + LLM-as-judge, z-score baselines
Silent Alerting	`rules/silent-alerting.md`	Loop detection, token spikes, escalation workflow

Key Decisions

Decision	Recommendation	Rationale
Metric methodology	RED method (Rate, Errors, Duration)	Industry standard, covers essential service health
Log format	Structured JSON	Machine-parseable, supports log aggregation
Tracing	OpenTelemetry	Vendor-neutral, auto-instrumentation, broad ecosystem
LLM observability	Langfuse (not LangSmith)	Open-source, self-hosted, built-in prompt management
LLM tracing API	`@observe(as_type=...)` + `score_current_span()`	v4: semantic types, inline scoring, span filtering
Langfuse APIs	Observations API v2 + Metrics API v2	v4 (Mar 2026): faster querying, aggregations at scale
Drift method	PSI for production, KS for small samples	PSI is stable for large datasets, KS more sensitive
Threshold strategy	Dynamic (95th percentile) over static	Reduces alert fatigue, context-aware
Alert severity	4 levels (Critical, High, Medium, Low)	Clear escalation paths, appropriate response times

Detailed Documentation

Resource	Description
`${CLAUDE_SKILL_DIR}/references/`	Logging, metrics, tracing, Langfuse, drift analysis guides
`${CLAUDE_SKILL_DIR}/checklists/`	Implementation checklists for monitoring and Langfuse setup
`${CLAUDE_SKILL_DIR}/examples/`	Real-world monitoring dashboard and trace examples
`${CLAUDE_SKILL_DIR}/scripts/`	Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

defense-in-depth - Layer 8 observability as part of security architecture
devops-deployment - Observability integration with CI/CD and Kubernetes
resilience-patterns - Monitoring circuit breakers and failure scenarios
llm-evaluation - Evaluation patterns that integrate with Langfuse scoring
caching - Caching strategies that reduce costs tracked by Langfuse