Observability Knowledge Base
Quick reference for the three pillars of observability, instrumentation patterns, and SLI/SLO/SLA definitions in PHP applications.
Three Pillars Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ THREE PILLARS OF OBSERVABILITY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ LOGS │ │ METRICS │ │ TRACES │ │
│ │ │ │ │ │ │ │
│ │ What happened │ │ How much/many │ │ How requests │ │
│ │ (discrete │ │ (aggregated │ │ flow through │ │
│ │ events) │ │ measurements) │ │ services) │ │
│ │ │ │ │ │ │ │
│ │ • Errors │ │ • Counters │ │ • Spans │ │
│ │ • Audit trail │ │ • Gauges │ │ • Context │ │
│ │ • Debug info │ │ • Histograms │ │ • Latency │ │
│ │ │ │ │ │ │ │
│ │ JSON structured │ │ Prometheus │ │ OpenTelemetry │ │
│ │ Monolog │ │ StatsD │ │ Jaeger/Zipkin │ │
│ └────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │ │
│ └─────────────────────┼──────────────────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ CORRELATION ID │ │
│ │ (links all three │ │
│ │ pillars) │ │
│ └───────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Structured Logging
JSON Log Format
| Field | Type | Description | Required |
|---|
timestamp | ISO 8601 | When event occurred | Yes |
level | string | RFC 5424 log level | Yes |
message | string | Human-readable description | Yes |
channel | string | Logger channel name | Yes |
context | object | Structured event data | No |
correlation_id | string | Request/trace identifier | Yes |
service | string | Service/app name | Yes |
environment | string | prod/staging/dev | Yes |
Log Levels (RFC 5424)
| Level | Code | When to Use |
|---|
| EMERGENCY | 0 | System is unusable |
| ALERT | 1 | Immediate action required |
| CRITICAL | 2 | Critical conditions (component failure) |
| ERROR | 3 | Runtime errors (not requiring immediate action) |
| WARNING | 4 | Exceptional but handled conditions |
| NOTICE | 5 | Normal but significant events |
| INFO | 6 | Informational messages (request processed) |
| DEBUG | 7 | Detailed debug information |
Monolog Context Processor
<?php
declare(strict_types=1);
namespace Infrastructure\Logging;
use Monolog\LogRecord;
use Monolog\Processor\ProcessorInterface;
final readonly class CorrelationIdProcessor implements ProcessorInterface
{
public function __construct(
private CorrelationIdHolder $holder,
) {}
public function __invoke(LogRecord $record): LogRecord
{
return $record->with(
extra: array_merge($record->extra, [
'correlation_id' => $this->holder->get(),
'service' => $_ENV['APP_SERVICE_NAME'] ?? 'unknown',
'environment' => $_ENV['APP_ENV'] ?? 'unknown',
]),
);
}
}
Correlation ID Holder
<?php
declare(strict_types=1);
namespace Infrastructure\Logging;
final class CorrelationIdHolder
{
private ?string $correlationId = null;
public function set(string $correlationId): void
{
$this->correlationId = $correlationId;
}
public function get(): string
{
if ($this->correlationId === null) {
$this->correlationId = uuid_create(UUID_TYPE_RANDOM);
}
return $this->correlationId;
}
}
Distributed Tracing
OpenTelemetry Concepts
| Concept | Description |
|---|
| Trace | End-to-end journey of a request across services |
| Span | Single unit of work within a trace (has start/end time) |
| SpanContext | Trace ID + Span ID + flags, propagated across boundaries |
| Attributes | Key-value metadata on spans |
| Events | Timestamped annotations within a span |
| Links | Connections between spans in different traces |
| Baggage | Cross-cutting key-value pairs propagated with context |
W3C Trace Context Header
traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: vendor1=value1,vendor2=value2
| Part | Length | Description |
|---|
| version | 2 hex | Always 00 |
| trace-id | 32 hex | Globally unique trace identifier |
| parent-id | 16 hex | ID of parent span |
| trace-flags | 2 hex | 01 = sampled |
OpenTelemetry PHP SDK Setup
<?php
declare(strict_types=1);
namespace Infrastructure\Telemetry;
use OpenTelemetry\API\Globals;
use OpenTelemetry\API\Trace\SpanKind;
use OpenTelemetry\API\Trace\StatusCode;
use OpenTelemetry\API\Trace\TracerInterface;
final readonly class TracingService
{
private TracerInterface $tracer;
public function __construct(string $serviceName = 'my-app')
{
$this->tracer = Globals::tracerProvider()->getTracer($serviceName);
}
public function traceOperation(string $operationName, callable $operation, array $attributes = []): mixed
{
$span = $this->tracer
->spanBuilder($operationName)
->setSpanKind(SpanKind::KIND_INTERNAL)
->startSpan();
$scope = $span->activate();
try {
foreach ($attributes as $key => $value) {
$span->setAttribute($key, $value);
}
$result = $operation();
$span->setStatus(StatusCode::STATUS_OK);
return $result;
} catch (\Throwable $e) {
$span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
$span->recordException($e);
throw $e;
} finally {
$scope->detach();
$span->end();
}
}
public function traceHttpClient(string $method, string $url, callable $request): mixed
{
$span = $this->tracer
->spanBuilder(sprintf('%s %s', $method, $url))
->setSpanKind(SpanKind::KIND_CLIENT)
->setAttribute('http.method', $method)
->setAttribute('http.url', $url)
->startSpan();
$scope = $span->activate();
try {
$result = $request();
$span->setStatus(StatusCode::STATUS_OK);
return $result;
} catch (\Throwable $e) {
$span->setStatus(StatusCode::STATUS_ERROR, $e->getMessage());
throw $e;
} finally {
$scope->detach();
$span->end();
}
}
}
Metrics
RED Method (Request-Driven Services)
| Metric | What | Unit | Example |
|---|
| Rate | Requests per second | req/s | HTTP requests per second by endpoint |
| Errors | Failed requests per second | err/s | 5xx responses per second |
| Duration | Latency distribution | ms | Response time p50, p95, p99 |
USE Method (Resource-Oriented)
| Metric | What | Example |
|---|
| Utilization | % time resource is busy | CPU usage, disk I/O |
| Saturation | Queued work | Request queue length |
| Errors | Error count | Disk errors, connection failures |
Golden Signals (Google SRE)
| Signal | Description | RED Equivalent |
|---|
| Latency | Time to service a request | Duration |
| Traffic | Demand on the system | Rate |
| Errors | Rate of failed requests | Errors |
| Saturation | How full the system is | (USE method) |
Prometheus PHP Client
<?php
declare(strict_types=1);
namespace Infrastructure\Metrics;
use Prometheus\CollectorRegistry;
use Prometheus\RenderTextFormat;
use Prometheus\Storage\Redis;
final class PrometheusMetricsCollector
{
private readonly CollectorRegistry $registry;
public function __construct(\Redis $redis)
{
$adapter = Redis::fromExistingConnection($redis);
$this->registry = new CollectorRegistry($adapter);
}
public function incrementRequestCount(string $method, string $route, int $statusCode): void
{
$counter = $this->registry->getOrRegisterCounter(
'app',
'http_requests_total',
'Total HTTP requests',
['method', 'route', 'status_code'],
);
$counter->inc([$method, $route, (string) $statusCode]);
}
public function observeRequestDuration(string $method, string $route, float $durationSeconds): void
{
$histogram = $this->registry->getOrRegisterHistogram(
'app',
'http_request_duration_seconds',
'HTTP request duration in seconds',
['method', 'route'],
[0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0],
);
$histogram->observe($durationSeconds, [$method, $route]);
}
public function setActiveConnections(int $count): void
{
$gauge = $this->registry->getOrRegisterGauge(
'app',
'active_connections',
'Current active connections',
[],
);
$gauge->set($count, []);
}
public function renderMetrics(): string
{
$renderer = new RenderTextFormat();
return $renderer->render($this->registry->getMetricFamilySamples());
}
}
SLI / SLO / SLA
| Concept | Definition | Example |
|---|
| SLI (Service Level Indicator) | Measurable metric of service behavior | Request latency p99 < 200ms |
| SLO (Service Level Objective) | Target value for an SLI | 99.9% of requests within 200ms |
| SLA (Service Level Agreement) | Contract with consequences | 99.5% uptime or credit issued |
Common SLIs
| SLI Type | Formula | Target (SLO) |
|---|
| Availability | successful_requests / total_requests | 99.9% (three nines) |
| Latency | requests < threshold / total_requests | 99% < 200ms, 99.9% < 1s |
| Error Rate | error_requests / total_requests | < 0.1% |
| Throughput | requests / time_window | > 1000 req/s |
| Freshness | time_since_last_update | < 5 minutes |
Error Budget
Error Budget = 1 - SLO
Example: SLO = 99.9%
Error Budget = 0.1% = ~43 minutes/month downtime allowed
Budget remaining = Error Budget - Actual Errors
If budget exhausted → freeze deployments, focus on reliability
Quick Reference Tables
Observability Tool Selection
| Need | Tool/Library | PHP Integration |
|---|
| Structured logging | Monolog | monolog/monolog |
| Log aggregation | ELK Stack, Loki | Monolog handlers |
| Metrics collection | Prometheus | promphp/prometheus_client_php |
| Metrics visualization | Grafana | Prometheus data source |
| Distributed tracing | Jaeger, Zipkin | OpenTelemetry PHP SDK |
| APM | Datadog, New Relic | PHP extensions/agents |
| Error tracking | Sentry | sentry/sentry-php |
| Health checks | Custom endpoint | PSR-15 middleware |
Alerting Thresholds
| Alert | Condition | Severity |
|---|
| High error rate | > 1% of requests 5xx | Critical |
| High latency | p99 > 2s for 5 min | Warning |
| Service down | Health check fails 3x | Critical |
| Disk usage | > 85% used | Warning |
| Queue backlog | > 10k unprocessed | Warning |
| Memory usage | > 90% for 10 min | Critical |
Common Violations Quick Reference
| Violation | Where to Look | Severity |
|---|
| No structured logging (plain text) | Logger config, log output | Warning |
| Missing correlation IDs | Middleware, log processors | Critical |
| No metrics endpoint | Routes, health controllers | Warning |
| Untraced external calls | HTTP clients, adapters | Warning |
| Swallowed exceptions without logging | Catch blocks | Critical |
| No health check endpoint | Routes, controllers | Warning |
| Missing request/response logging | Middleware | Warning |
| No alerting rules defined | Monitoring config | Warning |
Detection Patterns
# Logging setup
Grep: "Monolog|LoggerInterface|PsrLogLoggerInterface" --glob "**/*.php"
Grep: "monolog" --glob "**/composer.json"
Grep: "structured|json_formatter|JsonFormatter" --glob "**/*.php"
# Correlation IDs
Grep: "correlation.id|correlationId|X-Correlation-ID|X-Request-ID" --glob "**/*.php"
# Metrics
Grep: "Prometheus|CollectorRegistry|Counter|Histogram|Gauge" --glob "**/*.php"
Grep: "prometheus|promphp" --glob "**/composer.json"
Grep: "/metrics|metricsEndpoint" --glob "**/*.php"
# Tracing
Grep: "OpenTelemetry|Tracer|Span|SpanBuilder" --glob "**/*.php"
Grep: "open-telemetry|opentelemetry" --glob "**/composer.json"
Grep: "traceparent|tracestate|W3C" --glob "**/*.php"
# Health checks
Grep: "health|healthcheck|readiness|liveness" --glob "**/*.php"
Grep: "/health|/ready|/live" --glob "**/routes*.php"
# Error tracking
Grep: "Sentry|sentry|Bugsnag|Rollbar" --glob "**/*.php"
Grep: "sentry/sentry" --glob "**/composer.json"
# Log levels and context
Grep: "->error\(|->critical\(|->warning\(|->info\(" --glob "**/*.php"
Grep: "LogLevel::" --glob "**/*.php"
References
For detailed information, load these reference files:
references/logging-patterns.md — Structured logging, Monolog setup, context processors, log aggregation patterns
references/metrics-patterns.md — Counter/Gauge/Histogram types, Prometheus PHP client, RED metrics, alerting rules
references/tracing-patterns.md — OpenTelemetry PHP SDK, span creation, context propagation, sampling strategies