Provides guidance on OpenTelemetry SDK setup, custom instrumentation, and sending data to Honeycomb. Trigger phrases: "instrument my app", "add tracing", "set up OpenTelemetry", "configure OTel", "add custom spans", "add attributes to spans", "send traces to Honeycomb", "set up OTLP", "configure sampling", "add span events", "add span links", "set up tracing for [any language]", "configure the OTel Collector", or any request about OpenTelemetry SDK setup, custom instrumentation, or sending data to Honeycomb.
From honeycombnpx claudepluginhub honeycombio/agent-skill --plugin honeycombThis skill uses the workspace's default tool permissions.
references/architectural-patterns.mdreferences/collector-config.mdreferences/custom-instrumentation.mdreferences/sdk-setup-by-language.mdreferences/wide-event-attributes.mdDispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Executes pre-written implementation plans: critically reviews, follows bite-sized steps exactly, runs verifications, tracks progress with checkpoints, uses git worktrees, stops on blockers.
Guides idea refinement into designs: explores context, asks questions one-by-one, proposes approaches, presents sections for approval, writes/review specs before coding.
SDK setup, custom spans, attributes, span events, sampling, and layered telemetry. For conceptual foundations (why wide events matter, how attributes connect to investigation), see the observability-fundamentals skill.
Every OTel SDK needs three environment variables to send data to Honeycomb:
OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, and OTEL_SERVICE_NAME.
For the env var values, language-specific dependencies, and setup code (Go, Python,
Node.js, Java, Ruby, .NET, Rust), see
${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/sdk-setup-by-language.md.
Add business context to auto-instrumented spans — no new spans needed. Get the current
span from context and call SetAttributes (Go), set_attribute (Python), or
setAttribute (Node.js) with user, tenant, business, and deployment context.
Wrap important business operations for visibility in the trace waterfall. Use
tracer.Start(ctx, "operation-name") (Go), tracer.start_as_current_span("operation-name")
(Python), or tracer.startActiveSpan("operation-name", callback) (Node.js).
For full code examples in all languages, consult
${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/custom-instrumentation.md.
Not every function needs a span. Two questions determine whether a span is worth creating:
| Operation | Interesting? | Aggregable? | Create a Span? |
|---|---|---|---|
| HTTP request handler | Yes — variable latency, can fail | Yes — group by route, method, status | Yes |
| Database query | Yes — I/O bound, failure-prone | Yes — group by query type, table | Yes |
| External API call | Yes — network latency, dependencies | Yes — group by endpoint, status | Yes |
| Cache lookup | Yes — fast vs slow path | Yes — group by cache name, hit/miss | Yes |
| Message queue pub/consume | Yes — async boundary, delays | Yes — group by queue, message type | Yes |
| Business logic transaction | Yes — meaningful state change | Yes — group by type, outcome | Yes |
| Private helper function | No — trivial CPU, predictable | No — too granular | No |
| Loop iteration | Maybe — if slow | No — unbounded cardinality | No |
| Getter/setter | No — no meaningful duration | No — nothing to group by | No |
| Input validation (pure CPU) | No — fast, predictable | Maybe | No |
| Business logic orchestration | No — just calls instrumented code | No — duration is sum of children | No |
Common mistakes:
When in doubt, prefer attributes on existing spans over creating new child spans.
Record important sub-operation durations as attributes on the parent span. These are easier to query than child spans and work directly with BubbleUp.
// Go: time auth and record on the existing span
span := trace.SpanFromContext(r.Context())
authStart := time.Now()
user, err := authenticate(r)
span.SetAttributes(attribute.Float64("auth.duration_ms", float64(time.Since(authStart).Milliseconds())))
# Python: time auth and record on the existing span
span = trace.get_current_span()
auth_start = time.monotonic()
user = authenticate(request)
span.set_attribute("auth.duration_ms", (time.monotonic() - auth_start) * 1000)
Tag each error throw site with a unique static string (exception.slug). This creates
a low-cardinality, greppable identifier that connects dashboards directly to code.
// Go: static slug — greppable, safe for GROUP BY
span.SetAttributes(
attribute.String("exception.slug", "err-stripe-charge-failed"),
attribute.Bool("error", true),
)
span.RecordError(err)
# Python: static slug — greppable, safe for GROUP BY
span.set_attribute("exception.slug", "err-stripe-card-error")
span.set_attribute("error", True)
span.record_exception(e)
Find unhandled errors (missing slugs): WHERE error = true AND exception.slug does-not-exist.
For extended examples in all languages, see
${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/custom-instrumentation.md.
These are typically auto-instrumented by OTel SDKs and form the skeleton of your traces.
These are your business logic. Without custom spans here, you can see that a request was slow but not why — the trace waterfall has gaps where the important work happens invisibly.
Attributes are the dimensions BubbleUp uses during investigations. Every attribute you
add is a new axis BubbleUp can diff on to find what's different about outlier requests.
For the complete catalog organized by category with rationale and example queries, see
${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/wide-event-attributes.md.
For why attributes matter conceptually, see the observability-fundamentals skill.
span.add_event("event_name", {attributes}).Link to the related span context.See ${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/custom-instrumentation.md
for full examples of both patterns.
Sampling is about tradeoffs — there is no free lunch:
The math matters: if an error occurs 0.1% of the time and you head-sample at 1%, you'll capture roughly 1 in 100,000 of those errors. At moderate traffic, that error may never appear in your data.
Decides whether to sample a trace at creation time. Simple but can miss interesting traces.
OTEL_TRACES_SAMPLER env varalways_on (default), always_off, traceidratio (e.g., sample 10%)parentbased_traceidratio respects parent sampling decisionsDecides after the trace is complete. Keeps interesting traces (errors, slow requests).
tail_sampling processorOpenTelemetry is "trace-first" — context propagation is the glue that correlates all signals. But effective observability layers multiple signal types for different purposes.
A three-question test for choosing the right signal:
The histogram-alongside-spans pattern: For high-throughput HTTP services, emit both a span and a histogram metric for each handled request. This lets you head-sample traces for cost while histograms provide last-ditch alerting — and exemplars link outlier metric points back to specific traces for deeper investigation.
The technique is layering (not duplication) because each signal provides a different view at a different level of detail.
For architectural patterns where layering is essential (streaming, async jobs, ETL), see
${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/architectural-patterns.md.
OTel can send logs too. If you have existing log infrastructure, the OTel Collector can ingest logs and forward them to Honeycomb as structured events:
slog in Go,
logging in Python, winston/pino in Node.js) and exports them as OTel log records.filelog receiver: Reads log files, parses them, exports as OTLP.Logs sent through OTel arrive in Honeycomb as structured events with the same query capabilities as spans.
HTTP GET /api/users, db.query SELECT, process-payment)user.id, order.total, cache.hit)http.method, db.system, rpc.service)app., checkout., mycompany.)${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/sdk-setup-by-language.md — OTLP configuration and SDK setup for Go, Python, Node.js, Java, Ruby, .NET, Rust${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/custom-instrumentation.md — Custom instrumentation patterns with full code examples (timing attributes, exception slugs, async request summaries)${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/collector-config.md — OTel Collector configuration for format conversion, processing, and sampling${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/wide-event-attributes.md — Canonical attribute catalog organized by category with example queries${CLAUDE_PLUGIN_ROOT}/skills/otel-instrumentation/references/architectural-patterns.md — Trace design patterns for streaming, async, ETL, and serverless architectures