From opentelemetry
OpenTelemetry Python instrumentation knowledge base covering distributed tracing, async context propagation, custom transport propagators, sampling strategies, exporter configuration, and production observability patterns. SDK v1.40.0 target. TRIGGER WHEN: working with OpenTelemetry, distributed tracing, span instrumentation, context propagation, OTLP exporters, sampling strategies, or observability pipelines. DO NOT TRIGGER WHEN: general logging without trace correlation, or application monitoring tools unrelated to OTel.
npx claudepluginhub acaprino/alfio-claude-plugins --plugin opentelemetryThis skill uses the workspace's default tool permissions.
Knowledge base for instrumenting Python services with OpenTelemetry -- distributed tracing, metrics, and log correlation.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Knowledge base for instrumenting Python services with OpenTelemetry -- distributed tracing, metrics, and log correlation.
For most Python services, start with:
opentelemetry-bootstrap -a install + opentelemetry-instrument wrapperservice.name, service.version, deployment.environmentParentBased(TraceIdRatioBased(0.1)) -- 10% head samplinglocalhost:4317BatchSpanProcessor with default tuningprovider.shutdown() in lifespan/atexitThen upgrade incrementally based on needs:
tracer.start_as_current_span() callsPipeline:
TracerProvider -> SpanProcessor (batch/simple) -> Exporter (OTLP gRPC/HTTP)
Context:
contextvars.ContextVar stores current span + baggage per async task / thread
Signal flow:
API (opentelemetry.trace) -> SDK (TracerProvider impl) -> Processor -> Exporter -> Collector -> Backend
opentelemetry-api). SDK is the implementation (opentelemetry-sdk). Libraries depend on API only.sitecustomize.py entry pointOTEL_SERVICE_NAME, OTEL_TRACES_SAMPLER, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_SDK_DISABLEDservice.name is required, always set explicitly| Layer | Approach | Examples |
|---|---|---|
| HTTP frameworks | Auto | FastAPI, Django, Flask |
| Database clients | Auto | SQLAlchemy, psycopg2, asyncpg |
| HTTP clients | Auto | httpx, requests, aiohttp |
| Message queues | Auto | Celery, Kafka |
| Cache | Auto | redis, memcached |
| Business logic | Manual | Order processing, payment flows |
| Custom transport | Manual | AMQP payload, ZMQ events |
Combined pattern: opentelemetry-instrument wraps app for auto. Manual spans added inside routes/handlers for business logic.
FastAPI:
FastAPIInstrumentor.instrument_app(app, excluded_urls="health,ready,metrics")provider.shutdown() in async with cleanup blockrequest_id, tenant_id, user_idCelery:
@worker_process_init.connectBatchSpanProcessor threads do not survive fork() -- re-init requiredCeleryInstrumentor injects context into message headers automaticallySQLAlchemy async:
.sync_engine to instrumentor, not the async engineenable_commenter=True for SQL comment injection with trace contextDjango:
DjangoInstrumentor().instrument() in manage.py or WSGI entrypointRules for non-HTTP transports (AMQP, ZMQ, custom sockets):
inject() on producer side -- serialize trace context into message headers/payloadextract() on consumer side -- deserialize and attach to current contexttraceparent + tracestate keys)# Producer
from opentelemetry.context import attach
from opentelemetry.propagate import inject, extract
headers = {}
inject(headers)
message.payload["_trace_context"] = headers
# Consumer
ctx = extract(carrier=message.payload.get("_trace_context", {}))
token = attach(ctx)
try:
with tracer.start_as_current_span("process"):
handle(message)
finally:
context.detach(token)
Decision tree:
ParentBased wrapper -- respects parent's sampling decisionTraceIdRatioBased(rate) -- cheap, consistent, blind to contenttail_sampling processor -- keeps errors, slow traces. Requires all spans for a trace to hit same Collector instance.| Strategy | Where | Pros | Cons |
|---|---|---|---|
| AlwaysOn | SDK | Complete data | Expensive at scale |
| TraceIdRatio | SDK | Predictable cost | Drops interesting traces |
| ParentBased | SDK | Consistent per-trace | Depends on root decision |
| Tail sampling | Collector | Content-aware | Memory-heavy, needs affinity |
Env vars: OTEL_TRACES_SAMPLER=parentbased_traceidratio, OTEL_TRACES_SAMPLER_ARG=0.1
Default values work for most services. Tune only when observing dropped spans or export latency:
| Parameter | Default | Guidance |
|---|---|---|
max_queue_size | 2048 | Increase for bursty workloads (4096-8192) |
schedule_delay_millis | 5000 | Lower (1000-2000) for near-realtime export |
max_export_batch_size | 512 | Match exporter batch limits |
export_timeout_millis | 30000 | Reduce if exporter hangs block shutdown |
otel.bsp.spans.dropped metric for queue overflowSimpleSpanProcessor only in tests -- it blocks on every span endLog-trace correlation:
opentelemetry-instrumentation-logging injects otelTraceID, otelSpanID into log recordsOTEL_PYTHON_LOG_CORRELATION=trueTraceContextFilter to logging handler for trace_id/span_id in JSON outputMetrics:
MeterProvider + instruments: Counter, Histogram, UpDownCounter, GaugeResource with TracerProvider for correlationObservableGauge for system metrics (pool sizes, queue depth)Logs SDK:
_logs module is experimental -- use instrumentation bridge, not direct LoggerProviderspan.record_exception(e, attributes={}) -- unexpected errors; creates span event with stacktracespan.add_event("name", attributes={}) -- expected business outcomes (e.g. declined payment)UNSET -> OK (explicit success) or UNSET -> ERROR (failures, 5xx only for HTTP)start_as_current_span() auto-records exceptions and sets ERROR status by defaultOK on every span -- UNSET is the correct default for successful operationsOTEL_SPAN_ATTRIBUTE_COUNT_LIMIT if needed.Detailed reference documents are in the references/ directory:
async-context-propagation.md -- contextvars mechanics, asyncio task propagation, thread boundary traps, TracedThreadPoolExecutor, Python 3.12+ improvementsinstrumentation-patterns.md -- auto-instrumentation setup, FastAPI/Celery/SQLAlchemy patterns, custom decorators, error handling in spansexporters-and-backends.md -- OTLP gRPC vs HTTP, BatchSpanProcessor tuning, propagation formats, custom SpanProcessors, Collector pipelinesaws-deployment.md -- ADOT distro, X-Ray integration, Lambda layers, ECS sidecar, collector-less direct export, migration from aws-xray-sdkproduction-checklist.md -- do/don't operational rules, resource detection, SDK signal maturity, package versions, breaking changes