From harness-claude
Configures OpenTelemetry head, tail, and priority-based sampling to control trace volume/costs, retain error traces, and manage high-traffic services.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Control trace volume and costs with head sampling, tail sampling, and priority-based strategies
Instruments apps with OpenTelemetry for distributed tracing and Jaeger/Tempo integration. Debugs latency in microservices, analyzes request flows, correlates traces with logs/metrics.
Guides OpenTelemetry instrumentation setup for traces, metrics, logs including spans, resources, SDKs for Node.js, Python, Java, Go, .NET, Ruby, PHP, Next.js, browser, and Kubernetes best practices.
Guides implementing distributed tracing in microservices with OpenTelemetry, covering traces, spans, context propagation, and cross-service debugging.
Share bugs, ideas, or general feedback.
Control trace volume and costs with head sampling, tail sampling, and priority-based strategies
TraceIdRatioBasedSampler for probabilistic head sampling.ParentBasedSampler to respect upstream sampling decisions (if the parent was sampled, the child should be too).tailsampling processor.// Head sampling — SDK-level
import { NodeSDK } from '@opentelemetry/sdk-node';
import {
TraceIdRatioBasedSampler,
ParentBasedSampler,
AlwaysOnSampler,
} from '@opentelemetry/sdk-trace-base';
// Sample 10% of traces, but respect parent decisions
const sampler = new ParentBasedSampler({
root: new TraceIdRatioBasedSampler(0.1), // 10% of root spans
// If parent was sampled, always sample child
// If parent was not sampled, never sample child
});
const sdk = new NodeSDK({
sampler,
// ...
});
// Custom sampler — always sample errors and slow requests
import { Sampler, SamplingDecision, SamplingResult } from '@opentelemetry/sdk-trace-base';
class PrioritySampler implements Sampler {
private ratioSampler = new TraceIdRatioBasedSampler(0.1);
shouldSample(
context: Context,
traceId: string,
spanName: string,
spanKind: SpanKind,
attributes: Attributes
): SamplingResult {
// Always sample health checks out
if (spanName.includes('/health')) {
return { decision: SamplingDecision.NOT_RECORD };
}
// Always sample specific routes
if (spanName.includes('/api/payments')) {
return { decision: SamplingDecision.RECORD_AND_SAMPLED };
}
// Default: ratio-based
return this.ratioSampler.shouldSample(context, traceId, spanName, spanKind, attributes);
}
toString(): string {
return 'PrioritySampler';
}
}
# Tail sampling in OpenTelemetry Collector
processors:
tail_sampling:
decision_wait: 10s # Wait for all spans in a trace
num_traces: 100000 # Max traces held in memory
policies:
# Always keep error traces
- name: errors
type: status_code
status_code:
status_codes: [ERROR]
# Always keep slow traces (> 2s)
- name: slow-traces
type: latency
latency:
threshold_ms: 2000
# Sample 10% of everything else
- name: default
type: probabilistic
probabilistic:
sampling_percentage: 10
# Always keep payment-related traces
- name: payments
type: string_attribute
string_attribute:
key: http.route
values: ['/api/payments.*']
service:
pipelines:
traces:
processors: [tail_sampling, batch]
Head vs tail sampling:
| Head Sampling | Tail Sampling | |
|---|---|---|
| Decision point | Trace start | Trace end |
| Can consider outcome | No | Yes (errors, latency) |
| Resource cost | Low (decide once) | High (buffer all spans) |
| Implementation | SDK sampler | Collector processor |
| Consistency | All spans in trace agree | All spans in trace agree |
Recommended strategy for production:
ParentBasedSampler is critical: Without it, a sampled parent trace can have unsampled children, creating broken traces. Always wrap your root sampler with ParentBasedSampler.
Cost estimation: A typical span is 200-500 bytes. At 1000 requests/second with 10 spans per request, that is 10K spans/second. At 10% sampling, 1K spans/second (roughly 500 KB/s or 1.3 TB/month).
Environment variable control:
OTEL_TRACES_SAMPLER=parentbased_traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1 # 10%
https://opentelemetry.io/docs/concepts/sampling/