Skill

monitoring-observability

Install
1
Install the plugin
$
npx claudepluginhub yonatangross/orchestkit --plugin ork

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse v4 LLM tracing (as_type, score_current_span, should_export_span, LangfuseMedia), and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.

Tool Access

This skill is limited to using the following tools:

ReadGlobGrepWebFetchWebSearch
Supporting Assets
View in Repository
checklists/langfuse-setup-checklist.md
checklists/monitoring-implementation-checklist.md
examples/orchestkit-langfuse-traces.md
examples/orchestkit-monitoring-dashboard.md
metadata.json
references/agent-observability.md
references/alerting-dashboards.md
references/alerting-strategies.md
references/annotation-queues.md
references/cost-tracking.md
references/dashboards.md
references/dev-agent-lens.md
references/distributed-tracing.md
references/embedding-drift.md
references/evaluation-scores.md
references/ewma-baselines.md
references/experiments-api.md
references/framework-integrations.md
references/langfuse-evidently-integration.md
references/logging-patterns.md
Skill Content

Monitoring & Observability

Comprehensive patterns for infrastructure monitoring, LLM observability, and quality drift detection. Each category has individual rule files in rules/ loaded on-demand.

Quick Reference

CategoryRulesImpactWhen to Use
Infrastructure Monitoring3CRITICALPrometheus metrics, Grafana dashboards, alerting rules
LLM Observability3HIGHLangfuse tracing, cost tracking, evaluation scoring
Drift Detection3HIGHStatistical drift, quality regression, drift alerting
Silent Failures3HIGHTool skipping, quality degradation, loop/token spike alerting

Total: 12 rules across 4 categories

Quick Start

# Prometheus metrics with RED method
from prometheus_client import Counter, Histogram

http_requests = Counter('http_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
http_duration = Histogram('http_request_duration_seconds', 'Request latency',
    buckets=[0.01, 0.05, 0.1, 0.5, 1, 2, 5])
# Langfuse v4 LLM tracing — semantic as_type + inline scoring
from langfuse import observe, get_client

@observe(as_type="generation", name="analyze_content")
async def analyze_content(content: str):
    get_client().update_current_trace(
        user_id="user_123", session_id="session_abc",
        tags=["production", "orchestkit"],
    )
    result = await llm.generate(content)
    get_client().score_current_span(name="response_quality", value=0.85)
    return result
# PSI drift detection
import numpy as np

psi_score = calculate_psi(baseline_scores, current_scores)
if psi_score >= 0.25:
    alert("Significant quality drift detected!")

Infrastructure Monitoring

Prometheus metrics, Grafana dashboards, and alerting for application health.

RuleFileKey Pattern
Prometheus Metricsrules/monitoring-prometheus.mdRED method, counters, histograms, cardinality
Grafana Dashboardsrules/monitoring-grafana.mdGolden Signals, SLO/SLI, health checks
Alerting Rulesrules/monitoring-alerting.mdSeverity levels, grouping, escalation, fatigue prevention

LLM Observability

Langfuse-based tracing, cost tracking, and evaluation for LLM applications.

RuleFileKey Pattern
Langfuse Tracesrules/llm-langfuse-traces.md@observe decorator, OTEL spans, agent graphs
Cost Trackingrules/llm-cost-tracking.mdToken usage, spend alerts, Metrics API v2
Eval Scoringrules/llm-eval-scoring.mdCustom scores, evaluator tracing, quality monitoring

Drift Detection

Statistical and quality drift detection for production LLM systems.

RuleFileKey Pattern
Statistical Driftrules/drift-statistical.mdPSI, KS test, KL divergence, EWMA
Quality Driftrules/drift-quality.mdScore regression, baseline comparison, canary prompts
Drift Alertingrules/drift-alerting.mdDynamic thresholds, correlation, anti-patterns

Silent Failures

Detection and alerting for silent failures in LLM agents.

RuleFileKey Pattern
Tool Skippingrules/silent-tool-skipping.mdExpected vs actual tool calls, Langfuse traces
Quality Degradationrules/silent-degraded-quality.mdHeuristics + LLM-as-judge, z-score baselines
Silent Alertingrules/silent-alerting.mdLoop detection, token spikes, escalation workflow

Key Decisions

DecisionRecommendationRationale
Metric methodologyRED method (Rate, Errors, Duration)Industry standard, covers essential service health
Log formatStructured JSONMachine-parseable, supports log aggregation
TracingOpenTelemetryVendor-neutral, auto-instrumentation, broad ecosystem
LLM observabilityLangfuse (not LangSmith)Open-source, self-hosted, built-in prompt management
LLM tracing API@observe(as_type=...) + score_current_span()v4: semantic types, inline scoring, span filtering
Langfuse APIsObservations API v2 + Metrics API v2v4 (Mar 2026): faster querying, aggregations at scale
Drift methodPSI for production, KS for small samplesPSI is stable for large datasets, KS more sensitive
Threshold strategyDynamic (95th percentile) over staticReduces alert fatigue, context-aware
Alert severity4 levels (Critical, High, Medium, Low)Clear escalation paths, appropriate response times

Detailed Documentation

ResourceDescription
${CLAUDE_SKILL_DIR}/references/Logging, metrics, tracing, Langfuse, drift analysis guides
${CLAUDE_SKILL_DIR}/checklists/Implementation checklists for monitoring and Langfuse setup
${CLAUDE_SKILL_DIR}/examples/Real-world monitoring dashboard and trace examples
${CLAUDE_SKILL_DIR}/scripts/Templates: Prometheus, OpenTelemetry, health checks, Langfuse

Related Skills

  • defense-in-depth - Layer 8 observability as part of security architecture
  • devops-deployment - Observability integration with CI/CD and Kubernetes
  • resilience-patterns - Monitoring circuit breakers and failure scenarios
  • llm-evaluation - Evaluation patterns that integrate with Langfuse scoring
  • caching - Caching strategies that reduce costs tracked by Langfuse
Stats
Stars128
Forks14
Last CommitMar 21, 2026
Actions

Similar Skills