Help us improve
Share bugs, ideas, or general feedback.
From posthog
Helps author low-noise log alerts on PostHog services: triage candidates, characterize baselines, draft thresholds, back-test via simulate, and ship with notification destinations.
npx claudepluginhub anthropics/claude-plugins-official --plugin posthogHow this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:authoring-log-alertsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Authoring an alert is a _measurement_ problem, not a guessing problem. You are not trying to be exhaustive — you
Monitors PostHog log ingestion for volume bursts, severity shifts, service silence, and trace-correlated anomalies. Emits findings only when confidence is high.
Creates a complete monitoring setup guide covering golden signals, alerts, dashboards, logs, and tracing. Use when asked to set up monitoring or define alerting strategy.
Design monitoring and alerting that catches production issues fast without creating alert fatigue. Use when establishing observability or improving incident response.
Share bugs, ideas, or general feedback.
Authoring an alert is a measurement problem, not a guessing problem. You are not trying to be exhaustive — you are trying to land thresholds that fire 0–3 times per week on real production patterns, on services that matter.
posthog:logs-alerts-events-list to inspect
fire/resolve cadence and posthog:logs-alerts-partial-update to adjust).posthog:query-logs, don't author an alert mid-incident.| Tool | Job | Where it fits |
|---|---|---|
posthog:logs-services | Top-25 services in window with log_count, error_count, error_rate, sparkline. | Step 1 — triage. |
posthog:logs-attributes-list / posthog:logs-attribute-values-list | Discover keys/values for narrower filters. | Step 2, optional. |
posthog:logs-count-ranges | Adaptive time-bucketed counts for a filter. | Step 3 — baseline. |
posthog:logs-alerts-simulate-create | Replay a draft config against -7d history with full state machine. | Step 4 — validate. |
posthog:logs-alerts-create | Persist the alert. | Step 5 — ship. |
posthog:logs-alerts-destinations-create | Wire the alert to Slack or webhook. | Step 5 — ship. |
Do not call posthog:query-logs during authoring. You need distributions, not rows. Reserve posthog:query-logs for
the very end if the user asks "show me a sample of what would have fired" — limit: 10 is plenty.
Call posthog:logs-services for the last 24h with no filters. The response is capped at 25 services and includes a
sparkline, so it is small and bounded.
A service is a candidate when both are true:
log_count is non-trivial (≥ ~1k in 24h — quieter services produce too little signal to alert on).error_rate is non-zero, or the user has named the service explicitly.Skip services with high volume but error_rate == 0 unless the user wants a volume-shape alert (e.g. "warn me
if api-gateway suddenly stops producing logs"). Volume-floor alerts use threshold_operator: below and need
different reasoning — see references/volume-floor-alerts.md.
If the user names a service, treat it as a candidate even without error signal.
If a service has many error sub-types, an alert on "all errors" is usually too broad. Use
posthog:logs-attributes-list (try attribute_type: log) and posthog:logs-attribute-values-list to find a discriminator —
common ones are http.status_code, error.type, k8s.container.name. Add the narrowing filter to your draft.
Keep it simple: one severity filter + one or two attribute filters is plenty. Multi-clause filters are harder to reason about and rarely improve precision.
Call posthog:logs-count-ranges with the candidate's filters, dateRange: { date_from: "-7d" }, and
targetBuckets: 24 (one bucket ≈ 7h). The response gives you bucket counts.
Do not eyeball the percentiles or scale the threshold to the alert window manually. Pipe the count-ranges response into the helper script:
echo '<count-ranges JSON>' | python3 scripts/baseline_stats.py --window-minutes 5
The script returns:
{
"n_buckets": 12,
"bucket_minutes": 420.0,
"alert_window_minutes": 5,
"stats": { "p50": 12.0, "p95": 71.25, "p99": 126.25, "max": 140 },
"suggested_threshold_count": 5,
"rationale": "max(p99=126.25, median*3=36.0, floor=5) scaled from 420m bucket to 5m window",
"health": []
}
Use suggested_threshold_count as your starting threshold. Read health:
health flag | What it means | What to do |
|---|---|---|
sparse:N_of_M_buckets | Too few non-empty buckets for a 7d baseline. | Widen filter, extend to -30d, or skip. |
empty | All buckets are zero. | Skip — no signal. |
spiky | max is 10×+ p95. | Count-threshold alerts work well. Proceed. |
flat | p95 ≈ p50. | Be cautious — either no incidents in lookback, or the metric is too smooth. Try a longer lookback or skip. |
[] (empty) | Healthy distribution. | Proceed. |
Pick a starter draft from these defaults — see references/threshold-defaults.md for the reasoning:
| Setting | Default | Notes |
|---|---|---|
threshold_count | suggested_threshold_count from the script | Already scaled to the alert window. |
threshold_operator | above | Use below only for volume-floor alerts. |
window_minutes | 5 | Allowed: 5, 10, 15, 30, 60. Must match what you passed to the script. |
evaluation_periods | 3 | M in N-of-M. |
datapoints_to_alarm | 2 | N in N-of-M. 2-of-3 reduces flap from a single noisy bucket. |
cooldown_minutes | 30 | Minimum time between repeat fires. |
Call posthog:logs-alerts-simulate-create with these settings and date_from: "-7d". The response gives you fire_count
and resolve_count.
Target: fire_count between 0 and ~3 over -7d. If outside the band:
| Outcome | Adjustment |
|---|---|
fire_count = 0 over 7d and the baseline was spiky | Lower threshold_count toward stats.p95 from the script, or drop to 1-of-2. |
fire_count = 0 and the baseline was flat | The service has no alertable signal. Skip it; log why. |
fire_count > 5 | Raise threshold_count toward stats.max from the script, or move to 3-of-5 for a smoother window. |
fire_count is fine but resolve_count never matches fire_count | Cooldown is too long, or the underlying state is genuinely sticky. Acceptable for now. |
When adjusting the threshold, read values from the script's stats block — never recompute percentiles
by hand.
Cap iteration at 3 simulate calls per candidate. If you can't land in the band in 3 rounds, the metric is wrong — either the filter is too broad, the window is wrong, or the service genuinely doesn't have a threshold-shape signal. Note it and move on.
Once a draft simulates cleanly:
posthog:logs-alerts-create with the validated config. Use a name like <service> error rate (auto) so the
user can see at a glance which alerts came from this skill.posthog:logs-alerts-destinations-create to wire it to a notification target. An alert with no destination
is silent. Always confirm the channel name or webhook URL with the user before attaching — never wire
an auto-generated alert to a production channel without explicit confirmation. If the user is unsure,
suggest a low-traffic testing channel for the first few alerts.If the user wants alerts created in enabled: false state for review-then-flip, pass enabled: false to
-create and tell them how many drafts you produced.
The filters field on posthog:logs-alerts-create takes a subset of LogsViewerFilters and must contain at
least one of:
severityLevels — list of ["trace","debug","info","warn","error","fatal"]serviceNames — list of service name stringsfilterGroup — property filter groupThe same shape goes into posthog:logs-alerts-simulate-create's filters field. Match the simulate filters to the alert filters
exactly — otherwise the simulation is testing a different alert than the one you ship.
Example minimum:
{
"severityLevels": ["error", "fatal"],
"serviceNames": ["api-gateway"]
}
posthog:logs-services call at the start, not per-candidate.posthog:logs-count-ranges call per candidate at targetBuckets: 24. Don't go above 30 during authoring.posthog:logs-alerts-simulate-create calls per candidate.posthog:query-logs calls during the authoring loop.Report what you did, in this shape:
The user should be able to read this and decide whether to disable any drafts before they go live.