From posthog
Investigates PostHog MCP tool call quality using HogQL — error rates, latency, and failure patterns. Activated when users ask which MCP tool errors most or paste a tool-quality dashboard URL.
How this skill is triggered — by the user, by Claude, or both
Slash command
/posthog:exploring-mcp-tool-qualityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Any MCP server instrumented with PostHog's MCP analytics SDK emits a
Any MCP server instrumented with PostHog's MCP analytics SDK emits a
$mcp_tool_call event on the shared events table every time an agent invokes a
tool. There is no dedicated ClickHouse table — every field lives as a
$mcp_* property on events, and every tool-quality metric (error rate, latency
percentiles, reach) is an aggregation over this one event. This is the data
behind the MCP analytics dashboard and tool-quality screens.
HogQL via posthog:execute-sql is the primary path. There are no typed
tools for tool quality — it is all SQL. The full property schema and the
canonical query recipes live in the shared MCP data reference:
products/posthog_ai/skills/querying-posthog-data/references/models-mcp.md.
That reference is the single source of truth for the $mcp_* schema and the
effective-tool-name idiom used below — this skill inlines only the headline
"which tool errors most" query for convenience; pull the matrix, latency, and
harness recipes from the reference rather than re-deriving them. Read it before
writing queries.
Always use the effective tool name. New-SDK events wrap the real tool in
a single-exec call, so grouping on raw $mcp_tool_name collapses everything
under the wrapper. Use:
coalesce(nullIf(toString(properties.$mcp_exec_tool_call_name), ''), toString(properties.$mcp_tool_name))
Always read $mcp_is_error via toBool(...) and cast
$mcp_duration_ms via toFloat(...). The properties are strings.
Always set a time range — these queries scan events otherwise.
This is the canonical "which tool errors most" question. Rank tools by error
rate, but guard against small-sample noise with a HAVING floor on call volume:
posthog:execute-sql
SELECT
coalesce(nullIf(toString(properties.$mcp_exec_tool_call_name), ''), toString(properties.$mcp_tool_name)) AS tool,
count() AS total_calls,
countIf(toBool(properties.$mcp_is_error)) AS errors,
round(countIf(toBool(properties.$mcp_is_error)) * 100.0 / count(), 1) AS error_rate_pct
FROM events
WHERE event = '$mcp_tool_call'
AND coalesce(nullIf(toString(properties.$mcp_exec_tool_call_name), ''), toString(properties.$mcp_tool_name)) != ''
AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY tool
HAVING total_calls >= 20
ORDER BY error_rate_pct DESC, total_calls DESC
LIMIT 20
Report both rate and volume — a 100% error rate over 3 calls is rarely the
real story; a 12% rate over 50,000 calls is. Offer to pull the top
$mcp_error_message values for the worst tool (see below).
One row per tool with error rate, latency percentiles, and reach — mirrors the tool-quality screen. The ready-to-run query is in models-mcp.md under "Tool-quality matrix".
Pull the most common error messages for a tool, then correlate to richer
exception detail ($exception events carry $exception_message, joined by
$session_id and timestamp):
posthog:execute-sql
SELECT toString(properties.$mcp_error_message) AS error, count() AS n
FROM events
WHERE event = '$mcp_tool_call'
AND toBool(properties.$mcp_is_error)
AND coalesce(nullIf(toString(properties.$mcp_exec_tool_call_name), ''), toString(properties.$mcp_tool_name)) = '<tool>'
AND timestamp >= now() - INTERVAL 30 DAY
GROUP BY error ORDER BY n DESC LIMIT 10
Swap the aggregate for latency percentiles
(quantile(0.95)(toFloat(properties.$mcp_duration_ms))) and order by p95_ms.
The matrix query already returns p50_ms / p95_ms.
https://app.posthog.com/project/<project_id>/mcp-analytics/dashboardhttps://app.posthog.com/project/<project_id>/mcp-analytics/tool-qualityAlways surface a UI link so the user can verify visually.
HAVING total_calls >= N
floor stops tools with very few calls from topping the list spuriously$mcp_client_name lets you cut quality by harness (Claude Code vs Cursor vs
…); the canonical bucketing multiIf is in
models-mcp.mdexploring-mcp-sessions — drill into a
single agent run and its tool sequenceexploring-mcp-intent-clusters —
group agent goals and see which intents drive the errorsnpx claudepluginhub anthropics/claude-plugins-official --plugin posthogMonitors PostHog AI observability data for cost, latency, errors, volume, eval performance, clusters, and tool usage trends. Emits findings only when confidence is high; otherwise writes durable memory.
Guides Honeycomb queries on trace/event datasets: percentiles over AVG, HEATMAP distributions, relational fields (root.,any.,none.), calculated fields, query math, result interpretation (P99/P50, heatmaps). For latency, errors, outliers, slow requests.
Delivers a reliability health check from auto-captured network request, JS error, and error click data. Use for proactive quality monitoring, error budgets, or release impact analysis.