Help us improve
Share bugs, ideas, or general feedback.
From acc
Provides observability knowledge base covering three pillars (logs, metrics, traces), structured logging with Monolog, distributed tracing via OpenTelemetry, RED/USE metrics collection, and SLI/SLO/SLA definitions for PHP apps.
npx claudepluginhub dykyi-roman/awesome-claude-code --plugin accHow this skill is triggered — by the user, by Claude, or both
Slash command
/acc:observability-knowledgeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Quick reference for the three pillars of observability, instrumentation patterns, and SLI/SLO/SLA definitions in PHP applications.
Audits existing observability instrumentation and designs structured logging, metrics, distributed tracing, and alerting for production services. Use for coverage gaps, SLIs/SLOs.
Provides patterns for observability strategies covering logs, metrics, traces, and signal correlation. Use when designing monitoring systems or implementing the three pillars.
Provides monitoring and observability patterns including Prometheus RED/USE metrics, Pino/Winston structured logging, OpenTelemetry tracing, SLO-based alerting, Grafana dashboards, and burn rate alerts. Use when setting up metrics, logs, traces, or alerts for services.
Share bugs, ideas, or general feedback.
Quick reference for the three pillars of observability, instrumentation patterns, and SLI/SLO/SLA definitions in PHP applications.
┌─────────────────────────────────────────────────────────────────────────────┐
│ THREE PILLARS OF OBSERVABILITY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ LOGS │ │ METRICS │ │ TRACES │ │
│ │ │ │ │ │ │ │
│ │ What happened │ │ How much/many │ │ How requests │ │
│ │ (discrete │ │ (aggregated │ │ flow through │ │
│ │ events) │ │ measurements) │ │ services) │ │
│ │ │ │ │ │ │ │
│ │ • Errors │ │ • Counters │ │ • Spans │ │
│ │ • Audit trail │ │ • Gauges │ │ • Context │ │
│ │ • Debug info │ │ • Histograms │ │ • Latency │ │
│ │ │ │ │ │ │ │
│ │ JSON structured │ │ Prometheus │ │ OpenTelemetry │ │
│ │ Monolog │ │ StatsD │ │ Jaeger/Zipkin │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼──────────────────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ CORRELATION ID │ │
│ │ (links all three │ │
│ │ pillars) │ │
│ └───────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
| Field | Type | Description | Required |
|---|---|---|---|
timestamp | ISO 8601 | When event occurred | Yes |
level | string | RFC 5424 log level | Yes |
message | string | Human-readable description | Yes |
channel | string | Logger channel name | Yes |
context | object | Structured event data | No |
correlation_id | string | Request/trace identifier | Yes |
service | string | Service/app name | Yes |
environment | string | prod/staging/dev | Yes |
| Level | Code | When to Use |
|---|---|---|
| EMERGENCY | 0 | System is unusable |
| ALERT | 1 | Immediate action required |
| CRITICAL | 2 | Critical conditions (component failure) |
| ERROR | 3 | Runtime errors (not requiring immediate action) |
| WARNING | 4 | Exceptional but handled conditions |
| NOTICE | 5 | Normal but significant events |
| INFO | 6 | Informational messages (request processed) |
| DEBUG | 7 | Detailed debug information |
<?php
declare(strict_types=1);
namespace Infrastructure\Logging;
use Monolog\LogRecord;
use Monolog\Processor\ProcessorInterface;
final readonly class CorrelationIdProcessor implements ProcessorInterface
{
public function __construct(
private CorrelationIdHolder $holder,
) {}
public function __invoke(LogRecord $record): LogRecord
{
return $record->with(
extra: array_merge($record->extra, [
'correlation_id' => $this->holder->get(),
'service' => $_ENV['APP_SERVICE_NAME'] ?? 'unknown',
'environment' => $_ENV['APP_ENV'] ?? 'unknown',
]),
);
}
}
<?php
declare(strict_types=1);
namespace Infrastructure\Logging;
final class CorrelationIdHolder
{
private ?string $correlationId = null;
public function set(string $correlationId): void
{
$this->correlationId = $correlationId;
}
public function get(): string
{
if ($this->correlationId === null) {
$this->correlationId = uuid_create(UUID_TYPE_RANDOM);
}
return $this->correlationId;
}
}
| Concept | Description |
|---|---|
| Trace | End-to-end journey of a request across services |
| Span | Single unit of work within a trace (has start/end time) |
| SpanContext | Trace ID + Span ID + flags, propagated across boundaries |
| Attributes | Key-value metadata on spans |
| Events | Timestamped annotations within a span |
| Links | Connections between spans in different traces |
| Baggage | Cross-cutting key-value pairs propagated with context |
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: vendor1=value1,vendor2=value2
| Part | Length | Description |
|---|---|---|
| version | 2 hex | Always 00 |
| trace-id | 32 hex | Globally unique trace identifier |
| parent-id | 16 hex | ID of parent span |
| trace-flags | 2 hex | 01 = sampled |
<?php
declare(strict_types=1);
namespace Infrastructure\Telemetry;
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;
use OpenTelemetry\API\Trace\StatusCode;
use OpenTelemetry\API\Trace\TracerInterface;
final readonly class TracingService
{
private TracerInterface $tracer;
public function __construct(string $serviceName = 'my-app')
{
$this->tracer = Globals::tracerProvider()->getTracer($serviceName);
}
public function traceOperation(string $operationName, callable $operation, array $attributes = []): mixed
{
$span = $this->tracer
->spanBuilder($operationName)
->setSpanKind(SpanKind::KIND_INTERNAL)
->startSpan();
$scope = $span->activate();
try {
foreach ($attributes as $key => $value) {
$span->setAttribute($key, $value);
}
$result = $operation();
$span->setStatus(StatusCode::STATUS_OK);
return $result;
} catch (\Throwable $e) {
$span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
$span->recordException($e);
throw $e;
} finally {
$scope->detach();
$span->end();
}
}
public function traceHttpClient(string $method, string $url, callable $request): mixed
{
$span = $this->tracer
->spanBuilder(sprintf('%s %s', $method, $url))
->setSpanKind(SpanKind::KIND_CLIENT)
->setAttribute('http.method', $method)
->setAttribute('http.url', $url)
->startSpan();
$scope = $span->activate();
try {
$result = $request();
$span->setStatus(StatusCode::STATUS_OK);
return $result;
} catch (\Throwable $e) {
$span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
throw $e;
} finally {
$scope->detach();
$span->end();
}
}
}
| Metric | What | Unit | Example |
|---|---|---|---|
| Rate | Requests per second | req/s | HTTP requests per second by endpoint |
| Errors | Failed requests per second | err/s | 5xx responses per second |
| Duration | Latency distribution | ms | Response time p50, p95, p99 |
| Metric | What | Example |
|---|---|---|
| Utilization | % time resource is busy | CPU usage, disk I/O |
| Saturation | Queued work | Request queue length |
| Errors | Error count | Disk errors, connection failures |
| Signal | Description | RED Equivalent |
|---|---|---|
| Latency | Time to service a request | Duration |
| Traffic | Demand on the system | Rate |
| Errors | Rate of failed requests | Errors |
| Saturation | How full the system is | (USE method) |
<?php
declare(strict_types=1);
namespace Infrastructure\Metrics;
use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
use Prometheus\Storage\Redis;
final class PrometheusMetricsCollector
{
private readonly CollectorRegistry $registry;
public function __construct(\Redis $redis)
{
$adapter = Redis::fromExistingConnection($redis);
$this->registry = new CollectorRegistry($adapter);
}
public function incrementRequestCount(string $method, string $route, int $statusCode): void
{
$counter = $this->registry->getOrRegisterCounter(
'app',
'http_requests_total',
'Total HTTP requests',
['method', 'route', 'status_code'],
);
$counter->inc([$method, $route, (string) $statusCode]);
}
public function observeRequestDuration(string $method, string $route, float $durationSeconds): void
{
$histogram = $this->registry->getOrRegisterHistogram(
'app',
'http_request_duration_seconds',
'HTTP request duration in seconds',
['method', 'route'],
[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
);
$histogram->observe($durationSeconds, [$method, $route]);
}
public function setActiveConnections(int $count): void
{
$gauge = $this->registry->getOrRegisterGauge(
'app',
'active_connections',
'Current active connections',
[],
);
$gauge->set($count, []);
}
public function renderMetrics(): string
{
$renderer = new RenderTextFormat();
return $renderer->render($this->registry->getMetricFamilySamples());
}
}
| Concept | Definition | Example |
|---|---|---|
| SLI (Service Level Indicator) | Measurable metric of service behavior | Request latency p99 < 200ms |
| SLO (Service Level Objective) | Target value for an SLI | 99.9% of requests within 200ms |
| SLA (Service Level Agreement) | Contract with consequences | 99.5% uptime or credit issued |
| SLI Type | Formula | Target (SLO) |
|---|---|---|
| Availability | successful_requests / total_requests | 99.9% (three nines) |
| Latency | requests < threshold / total_requests | 99% < 200ms, 99.9% < 1s |
| Error Rate | error_requests / total_requests | < 0.1% |
| Throughput | requests / time_window | > 1000 req/s |
| Freshness | time_since_last_update | < 5 minutes |
Error Budget = 1 - SLO
Example: SLO = 99.9%
Error Budget = 0.1% = ~43 minutes/month downtime allowed
Budget remaining = Error Budget - Actual Errors
If budget exhausted → freeze deployments, focus on reliability
| Need | Tool/Library | PHP Integration |
|---|---|---|
| Structured logging | Monolog | monolog/monolog |
| Log aggregation | ELK Stack, Loki | Monolog handlers |
| Metrics collection | Prometheus | promphp/prometheus_client_php |
| Metrics visualization | Grafana | Prometheus data source |
| Distributed tracing | Jaeger, Zipkin | OpenTelemetry PHP SDK |
| APM | Datadog, New Relic | PHP extensions/agents |
| Error tracking | Sentry | sentry/sentry-php |
| Health checks | Custom endpoint | PSR-15 middleware |
| Alert | Condition | Severity |
|---|---|---|
| High error rate | > 1% of requests 5xx | Critical |
| High latency | p99 > 2s for 5 min | Warning |
| Service down | Health check fails 3x | Critical |
| Disk usage | > 85% used | Warning |
| Queue backlog | > 10k unprocessed | Warning |
| Memory usage | > 90% for 10 min | Critical |
| Violation | Where to Look | Severity |
|---|---|---|
| No structured logging (plain text) | Logger config, log output | Warning |
| Missing correlation IDs | Middleware, log processors | Critical |
| No metrics endpoint | Routes, health controllers | Warning |
| Untraced external calls | HTTP clients, adapters | Warning |
| Swallowed exceptions without logging | Catch blocks | Critical |
| No health check endpoint | Routes, controllers | Warning |
| Missing request/response logging | Middleware | Warning |
| No alerting rules defined | Monitoring config | Warning |
# Logging setup
Grep: "Monolog|LoggerInterface|PsrLogLoggerInterface" --glob "**/*.php"
Grep: "monolog" --glob "**/composer.json"
Grep: "structured|json_formatter|JsonFormatter" --glob "**/*.php"
# Correlation IDs
Grep: "correlation.id|correlationId|X-Correlation-ID|X-Request-ID" --glob "**/*.php"
# Metrics
Grep: "Prometheus|CollectorRegistry|Counter|Histogram|Gauge" --glob "**/*.php"
Grep: "prometheus|promphp" --glob "**/composer.json"
Grep: "/metrics|metricsEndpoint" --glob "**/*.php"
# Tracing
Grep: "OpenTelemetry|Tracer|Span|SpanBuilder" --glob "**/*.php"
Grep: "open-telemetry|opentelemetry" --glob "**/composer.json"
Grep: "traceparent|tracestate|W3C" --glob "**/*.php"
# Health checks
Grep: "health|healthcheck|readiness|liveness" --glob "**/*.php"
Grep: "/health|/ready|/live" --glob "**/routes*.php"
# Error tracking
Grep: "Sentry|sentry|Bugsnag|Rollbar" --glob "**/*.php"
Grep: "sentry/sentry" --glob "**/composer.json"
# Log levels and context
Grep: "->error\(|->critical\(|->warning\(|->info\(" --glob "**/*.php"
Grep: "LogLevel::" --glob "**/*.php"
For detailed information, load these reference files:
references/logging-patterns.md — Structured logging, Monolog setup, context processors, log aggregation patternsreferences/metrics-patterns.md — Counter/Gauge/Histogram types, Prometheus PHP client, RED metrics, alerting rulesreferences/tracing-patterns.md — OpenTelemetry PHP SDK, span creation, context propagation, sampling strategies