Help us improve
Share bugs, ideas, or general feedback.
From python-development
Patterns for structured logging, metrics collection, and distributed tracing in Python. Use when debugging production systems or adding observability.
npx claudepluginhub wshobson/agents --plugin python-developmentHow this skill is triggered — by the user, by Claude, or both
Slash command
/python-development:python-observabilityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Instrument Python applications with structured logs, metrics, and traces. When something breaks in production, you need to answer "what, where, and why" without deploying new code.
Instruments Python apps with structured logging via structlog, Prometheus metrics, distributed tracing, and correlation IDs for production observability and debugging.
Implements observability patterns for Python apps: structured logging with structlog, Prometheus metrics, request correlation IDs, and FastAPI middleware. Useful for production monitoring.
Observability discipline: structured logging, metrics instrumentation, distributed tracing, and signal correlation. Invoke whenever task involves any interaction with observability concerns — adding logging, designing metrics, instrumenting traces, correlating signals, reviewing instrumentation, or understanding when to use which pillar.
Share bugs, ideas, or general feedback.
Instrument Python applications with structured logs, metrics, and traces. When something breaks in production, you need to answer "what, where, and why" without deploying new code.
Emit logs as JSON with consistent fields for production environments. Machine-readable logs enable powerful queries and alerts. For local development, consider human-readable formats.
Track latency, traffic, errors, and saturation for every service boundary.
Thread a unique ID through all logs and spans for a single request, enabling end-to-end tracing.
Keep metric label values bounded. Unbounded labels (like user IDs) explode storage costs.
import structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer(),
],
)
logger = structlog.get_logger()
logger.info("Request processed", user_id="123", duration_ms=45)
Configure structlog for JSON output with consistent fields.
import logging
import structlog
def configure_logging(log_level: str = "INFO") -> None:
"""Configure structured logging for the application."""
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(
getattr(logging, log_level.upper())
),
context_class=dict,
logger_factory=structlog.PrintLoggerFactory(),
cache_logger_on_first_use=True,
)
# Initialize at application startup
configure_logging("INFO")
logger = structlog.get_logger()
Every log entry should include standard fields for filtering and correlation.
import structlog
from contextvars import ContextVar
# Store correlation ID in context
correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")
logger = structlog.get_logger()
def process_request(request: Request) -> Response:
"""Process request with structured logging."""
logger.info(
"Request received",
correlation_id=correlation_id.get(),
method=request.method,
path=request.path,
user_id=request.user_id,
)
try:
result = handle_request(request)
logger.info(
"Request completed",
correlation_id=correlation_id.get(),
status_code=200,
duration_ms=elapsed,
)
return result
except Exception as e:
logger.error(
"Request failed",
correlation_id=correlation_id.get(),
error_type=type(e).__name__,
error_message=str(e),
)
raise
Use log levels consistently across the application.
| Level | Purpose | Examples |
|---|---|---|
DEBUG | Development diagnostics | Variable values, internal state |
INFO | Request lifecycle, operations | Request start/end, job completion |
WARNING | Recoverable anomalies | Retry attempts, fallback used |
ERROR | Failures needing attention | Exceptions, service unavailable |
# DEBUG: Detailed internal information
logger.debug("Cache lookup", key=cache_key, hit=cache_hit)
# INFO: Normal operational events
logger.info("Order created", order_id=order.id, total=order.total)
# WARNING: Abnormal but handled situations
logger.warning(
"Rate limit approaching",
current_rate=950,
limit=1000,
reset_seconds=30,
)
# ERROR: Failures requiring investigation
logger.error(
"Payment processing failed",
order_id=order.id,
error=str(e),
payment_provider="stripe",
)
Never log expected behavior at ERROR. A user entering a wrong password is INFO, not ERROR.
Generate a unique ID at ingress and thread it through all operations.
from contextvars import ContextVar
import uuid
import structlog
correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")
def set_correlation_id(cid: str | None = None) -> str:
"""Set correlation ID for current context."""
cid = cid or str(uuid.uuid4())
correlation_id.set(cid)
structlog.contextvars.bind_contextvars(correlation_id=cid)
return cid
# FastAPI middleware example
from fastapi import Request
async def correlation_middleware(request: Request, call_next):
"""Middleware to set and propagate correlation ID."""
# Use incoming header or generate new
cid = request.headers.get("X-Correlation-ID") or str(uuid.uuid4())
set_correlation_id(cid)
response = await call_next(request)
response.headers["X-Correlation-ID"] = cid
return response
Propagate to outbound requests:
import httpx
async def call_downstream_service(endpoint: str, data: dict) -> dict:
"""Call downstream service with correlation ID."""
async with httpx.AsyncClient() as client:
response = await client.post(
endpoint,
json=data,
headers={"X-Correlation-ID": correlation_id.get()},
)
return response.json()
Detailed sections (starting with ## Advanced Patterns) live in references/details.md. Read that file when the navigation summary above is insufficient.