Skill

exploring-llm-traces

Debugs and inspects LLM/AI agent traces using PostHog MCP tools. Fetches traces by ID, analyzes spans/generations/tool calls, verifies context/subagents, and checks token usage/costs.

Python

ai-ml

monitoring

npx claudepluginhub anthropics/claude-plugins-official --plugin posthog

Tool Access

This skill uses the workspace's default tool permissions.

Preview

PostHog captures LLM/AI agent activity as traces. Each trace is a tree of events representing

Supporting Assets

references/events-and-properties.mdreferences/example-llm-trace.mdreferences/example-llm-traces-list.mdscripts/extract_conversation.pyscripts/extract_span.pyscripts/print_summary.pyscripts/print_timeline.pyscripts/search_traces.pyscripts/show_structure.py

SKILL.md

Similar Skills

arize-trace

Downloads, exports, and inspects Arize traces, spans, and sessions using ax CLI to debug LLM apps, investigate errors, and analyze regressions.

2 files

arize-skills

arize-traces

Retrieves and debugs trace and span data from Arize ML observability platform using arize_toolkit CLI. Lists recent traces, fetches by ID, shows spans, analyzes latency/tokens/cost, exports data.

1 file

arize-toolkit

arize-trace

30.6k

Exports Arize traces, spans, and sessions via ax CLI for LLM app debugging. Covers ID-based pulls, exploratory sampling, auth troubleshooting, and untrusted content safeguards.

2 files

arize-ax

Stats

Stars29

Forks5

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Exploring LLM traces with MCP tools

PostHog captures LLM/AI agent activity as traces. Each trace is a tree of events representing a single AI interaction — from the top-level agent invocation down to individual LLM API calls.

Available tools

Tool	Purpose
`posthog:query-llm-traces-list`	Search and list traces (compact — no large content)
`posthog:query-llm-trace`	Get a single trace by ID with full event tree
`posthog:execute-sql`	Ad-hoc SQL for complex trace analysis

Event hierarchy

See the event reference for the full schema.

$ai_trace (top-level container)
  └── $ai_span (logical groupings, e.g. "RAG retrieval", "tool execution")
        ├── $ai_generation (individual LLM API call)
        └── $ai_embedding (embedding creation)

Events are linked via $ai_parent_id → parent's $ai_span_id or $ai_trace_id.

Workflow: debug a trace from a URL

Step 1 — Fetch the trace

posthog:query-llm-trace
{
  "traceId": "<trace_id>",
  "dateRange": {"date_from": "-7d"}
}

The result contains the full event tree with all properties. The response may be large — when it exceeds the inline limit, Claude Code auto-persists it to a file.

From the result you get:

Every event with its type ($ai_span, $ai_generation, etc.)
Span names ($ai_span_name) — these are the tool/step names
Latency, error flags, models used
Parent-child relationships via $ai_parent_id
_posthogUrl — always include this in your response so the user can click through to the UI

Step 2 — Parse large results with scripts

When the result is persisted to a file (large traces with full $ai_input/$ai_output_choices), use the parsing scripts to explore it.

Start with the summary to get the full picture, then drill into specifics:

# 1. Overview: metadata, tool calls, final output, errors
python3 scripts/print_summary.py /path/to/persisted-file.json

# 2. Timeline: chronological event list with truncated I/O
python3 scripts/print_timeline.py /path/to/persisted-file.json

# 3. Drill into a specific span's full input/output
SPAN="tool_name" python3 scripts/extract_span.py /path/to/persisted-file.json

# 4. Full conversation with thinking blocks and tool calls
python3 scripts/extract_conversation.py /path/to/persisted-file.json

# 5. Search for a keyword across all properties
SEARCH="keyword" python3 scripts/search_traces.py /path/to/persisted-file.json

All scripts support MAX_LEN=N env var to control truncation (0 = unlimited).

Investigation patterns

"Did the agent use the tool correctly?"

Find the $ai_span for the tool call (look at $ai_span_name)
Check $ai_input_state — what arguments were passed to the tool?
Check $ai_output_state — what did the tool return?
Check $ai_is_error — did the tool call fail?

"Was the context correct?" / "Were the right files surfaced?"

Find the $ai_generation event where the LLM made the decision
Check $ai_input — this is the full message history the LLM saw
Look at preceding $ai_span events for retrieval/search steps
Check their $ai_output_state — what content was retrieved and fed to the LLM?

"Did the subagent work?"

In the structural overview, find spans that are children of other spans (via $ai_parent_id)
The parent span is the orchestrator; child spans are subagent steps
Check each child's $ai_output_state and $ai_is_error
If a child span contains $ai_generation events, those are the subagent's LLM calls

"Why did the LLM say X?"

Use search_traces.py to find where the text appears: SEARCH="the text" python3 scripts/search_traces.py FILE
This shows which event and property path contains it
Check the $ai_input of that generation to see what the LLM was told before it said X

Constructing UI links

The trace tools return _posthogUrl — always surface this to the user.

You can also construct links manually:

Trace detail: https://app.posthog.com/llm-observability/traces/<trace_id>?timestamp=<url_encoded_timestamp>&event=<optional_event_id>
Traces list with filters: returned in _posthogUrl from query-llm-traces-list

The timestamp query param is required — use the createdAt of the earliest event in the trace, URL-encoded (e.g. timestamp=2026-04-01T19%3A39%3A20Z).

When presenting findings, always include the relevant PostHog URL so the user can verify.

Finding traces

Use posthog:query-llm-traces-list to search and filter traces.

CRITICAL: Never assume event names, property names, or property values from training data. Every project instruments different custom properties. Always call posthog:read-data-schema first to discover what properties and values actually exist in the project's data before constructing filters.

Discovering the schema first

Before filtering traces, discover what's available:

Confirm AI events exist — call posthog:read-data-schema with kind: "events" and look for $ai_* events
Find filterable properties — call posthog:read-data-schema with kind: "event_properties" and event_name: "$ai_generation" (or another AI event) to see what properties are captured
Get actual values — call posthog:read-data-schema with kind: "event_property_values", event_name: "$ai_generation", and property_name: "$ai_model" to see real model names in use

Only then construct the query-llm-traces-list call with property filters.

This is especially important for custom properties like project_id, conversation_id, user_tier, etc. — these vary per project and cannot be guessed.

Do not confirm $ai_* properties, but confirm any other like email of a person.

By filters

posthog:query-llm-traces-list
{
  "dateRange": {"date_from": "-1h"},
  "filterTestAccounts": true,
  "limit": 20,
  "properties": [
    {"type": "event", "key": "$ai_model", "value": "gpt-4o", "operator": "exact"}
  ]
}

Multiple filters are AND-ed together:

posthog:query-llm-traces-list
{
  "dateRange": {"date_from": "-1h"},
  "filterTestAccounts": true,
  "properties": [
    {"type": "event", "key": "$ai_provider", "value": "anthropic", "operator": "exact"},
    {"type": "event", "key": "$ai_is_error", "value": ["true"], "operator": "exact"}
  ]
}

You can also filter by person properties (discover them via read-data-schema with kind: "entity_properties" and entity: "person"):

posthog:query-llm-traces-list
{
  "dateRange": {"date_from": "-1h"},
  "filterTestAccounts": true,
  "properties": [
    {"type": "person", "key": "email", "value": "@company.com", "operator": "icontains"}
  ]
}

By external identifiers

Customers often store their own IDs as event or person properties. Use posthog:read-data-schema to discover what custom properties exist, then filter:

Call posthog:read-data-schema with kind: "event_properties" and event_name: "$ai_trace" to find custom properties
Review the returned properties and their sample values
Construct the filter using the discovered property key and a known value

posthog:query-llm-traces-list
{
  "dateRange": {"date_from": "-7d"},
  "properties": [
    {"type": "event", "key": "project_id", "value": "proj_abc123", "operator": "exact"}
  ]
}

By SQL (for full-text search or custom aggregations)

Use SQL when you need something query-llm-traces-list can't express — typically full-text search across message content or custom aggregations.

SELECT
    properties.$ai_trace_id AS trace_id,
    properties.$ai_model AS model,
    timestamp
FROM events
WHERE
    event = '$ai_generation'
    AND timestamp >= now() - INTERVAL 1 HOUR
    AND properties.$ai_input ILIKE '%search term%'
ORDER BY timestamp DESC
LIMIT 20

For more complex SQL patterns, read these references:

Single trace retrieval — fetches a single trace by ID with all events and properties (renders the TraceQuery HogQL)
Traces list with aggregated metrics — two-phase query: find trace IDs first, then fetch aggregated latency, tokens, costs, and error counts

Parsing large trace results

Trace tool results are JSON. When too large to read inline, Claude Code persists them to a file.

Persisted file format

[{ "type": "text", "text": "{\"results\": [...], \"_posthogUrl\": \"...\"}" }]

Trace JSON structure

results (array for list, object for single trace)
  ├── id, traceName, createdAt, totalLatency, totalCost
  ├── inputState, outputState (trace-level state)
  └── events[]
        ├── event ($ai_span | $ai_generation | $ai_embedding | $ai_metric | $ai_feedback)
        ├── id, createdAt
        └── properties
              ├── $ai_span_name, $ai_latency, $ai_is_error
              ├── $ai_input_state, $ai_output_state (span tool I/O)
              ├── $ai_input, $ai_output_choices (generation messages)
              ├── $ai_model, $ai_provider
              └── $ai_input_tokens, $ai_output_tokens, $ai_total_cost_usd

Available scripts

Script	Purpose	Usage
`print_summary.py`	Trace metadata, tool calls, errors, and final LLM output	`python3 scripts/print_summary.py FILE`
`print_timeline.py`	Chronological event timeline with I/O summaries	`python3 scripts/print_timeline.py FILE`
`extract_span.py`	Full input/output of a specific span by name	`SPAN="name" python3 scripts/extract_span.py FILE`
`extract_conversation.py`	LLM messages with thinking blocks and tool calls	`python3 scripts/extract_conversation.py FILE`
`search_traces.py`	Find a keyword across all event properties	`SEARCH="keyword" python3 scripts/search_traces.py FILE`
`show_structure.py`	Show JSON keys and types without values	`cat blob.json \| python3 scripts/show_structure.py`

Tips

Always set dateRange — queries without a time range are slow. Use narrow windows (-30m, -1h) for broad listing queries; wider windows (-7d, -30d) are fine for narrow queries filtered by trace ID or specific property values
Always include the _posthogUrl in your response so the user can click through
$ai_input_state / $ai_output_state on spans contain tool call inputs and outputs
$ai_input / $ai_output_choices on generations contain the full LLM conversation — can be megabytes; when the result is persisted to a file, use the parsing scripts
Use filterTestAccounts: true to exclude internal/test traffic when searching
$ai_trace events are NOT in the events array — their data is surfaced via trace-level inputState, outputState, and traceName