npx claudepluginhub yonatangross/orchestkit --plugin orkWant just this skill?
Add to a custom plugin, then install with one command.
Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.
This skill is limited to using the following tools:
checklists/langfuse-setup-checklist.mdchecklists/monitoring-implementation-checklist.mdexamples/orchestkit-langfuse-traces.mdexamples/orchestkit-monitoring-dashboard.mdmetadata.jsonreferences/agent-observability.mdreferences/alerting-dashboards.mdreferences/alerting-strategies.mdreferences/annotation-queues.mdreferences/cost-tracking.mdreferences/dashboards.mdreferences/dev-agent-lens.mdreferences/distributed-tracing.mdreferences/embedding-drift.mdreferences/evaluation-scores.mdreferences/ewma-baselines.mdreferences/experiments-api.mdreferences/framework-integrations.mdreferences/langfuse-evidently-integration.mdreferences/logging-patterns.mdMonitoring & Observability
Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.
Quick Reference
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Infrastructure Monitoring | 3 | CRITICAL | Prometheus metrics, Grafana dashboards, alerting rules |
| LLM Observability | 3 | HIGH | Langfuse tracing, cost tracking, evaluation scoring |
| Drift Detection | 3 | HIGH | Statistical drift, quality regression, drift alerting |
| Silent Failures | 3 | HIGH | Tool skipping, quality degradation, loop/token spike alerting |
Total: 12 rules across 4 categories
Quick Start
# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram
http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client
@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
get_client().update_current_trace(
user_id="user_123", session_id="session_abc",
tags=["production", "orchestkit"],
)
result = await llm.generate(content)
get_client().score_current_span(name="response_quality", value=0.85)
return result
# PSI drift detection
import numpy as np
psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
alert("Significant quality drift detected!")
Infrastructure Monitoring
Prometheus metrics, Grafana dashboards, and alerting for application health.
| Rule | File | Key Pattern |
|---|---|---|
| Prometheus Metrics | rules/monitoring-prometheus.md | RED method, counters, histograms, cardinality |
| Grafana Dashboards | rules/monitoring-grafana.md | Golden Signals, SLO/SLI, health checks |
| Alerting Rules | rules/monitoring-alerting.md | Severity levels, grouping, escalation, fatigue prevention |
LLM Observability
Langfuse-based tracing, cost tracking, and evaluation for LLM applications.
| Rule | File | Key Pattern |
|---|---|---|
| Langfuse Traces | rules/llm-langfuse-traces.md | @observe decorator, OTEL spans, agent graphs |
| Cost Tracking | rules/llm-cost-tracking.md | Token usage, spend alerts, Metrics API v2 |
| Eval Scoring | rules/llm-eval-scoring.md | Custom scores, evaluator tracing, quality monitoring |
Drift Detection
Statistical and quality drift detection for production LLM systems.
| Rule | File | Key Pattern |
|---|---|---|
| Statistical Drift | rules/drift-statistical.md | PSI, KS test, KL divergence, EWMA |
| Quality Drift | rules/drift-quality.md | Score regression, baseline comparison, canary prompts |
| Drift Alerting | rules/drift-alerting.md | Dynamic thresholds, correlation, anti-patterns |
Silent Failures
Detection and alerting for silent failures in LLM agents.
| Rule | File | Key Pattern |
|---|---|---|
| Tool Skipping | rules/silent-tool-skipping.md | Expected vs actual tool calls, Langfuse traces |
| Quality Degradation | rules/silent-degraded-quality.md | Heuristics + LLM-as-judge, z-score baselines |
| Silent Alerting | rules/silent-alerting.md | Loop detection, token spikes, escalation workflow |
Key Decisions
| Decision | Recommendation | Rationale |
|---|---|---|
| Metric methodology | RED method (Rate, Errors, Duration) | Industry standard, covers essential service health |
| Log format | Structured JSON | Machine-parseable, supports log aggregation |
| Tracing | OpenTelemetry | Vendor-neutral, auto-instrumentation, broad ecosystem |
| LLM observability | Langfuse (not LangSmith) | Open-source, self-hosted, built-in prompt management |
| LLM tracing API | @observe(as_type=...) + score_current_span() | v4: semantic types, inline scoring, span filtering |
| Langfuse APIs | Observations API v2 + Metrics API v2 | v4 (Mar 2026): faster querying, aggregations at scale |
| Drift method | PSI for production, KS for small samples | PSI is stable for large datasets, KS more sensitive |
| Threshold strategy | Dynamic (95th percentile) over static | Reduces alert fatigue, context-aware |
| Alert severity | 4 levels (Critical, High, Medium, Low) | Clear escalation paths, appropriate response times |
Detailed Documentation
| Resource | Description |
|---|---|
${CLAUDE_SKILL_DIR}/references/ | Logging, metrics, tracing, Langfuse, drift analysis guides |
${CLAUDE_SKILL_DIR}/checklists/ | Implementation checklists for monitoring and Langfuse setup |
${CLAUDE_SKILL_DIR}/examples/ | Real-world monitoring dashboard and trace examples |
${CLAUDE_SKILL_DIR}/scripts/ | Templates: Prometheus, OpenTelemetry, health checks, Langfuse |
Related Skills
defense-in-depth- Layer 8 observability as part of security architecturedevops-deployment- Observability integration with CI/CD and Kubernetesresilience-patterns- Monitoring circuit breakers and failure scenariosllm-evaluation- Evaluation patterns that integrate with Langfuse scoringcaching- Caching strategies that reduce costs tracked by Langfuse