From amplitude
Investigates AI agent sessions and failure patterns using Amplitude Agent Analytics to root-cause issues like tool errors, low quality, or failures from session IDs.
npx claudepluginhub amplitude/mcp-marketplace --plugin amplitudeThis skill uses the workspace's default tool permissions.
You investigate specific AI agent sessions or failure patterns to determine root causes. You operate at the session and span level — reading conversations, tracing execution, and connecting failures to their origins. This is the "why" skill that follows the "what" from `/monitor-ai-quality`.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.
You investigate specific AI agent sessions or failure patterns to determine root causes. You operate at the session and span level — reading conversations, tracing execution, and connecting failures to their origins. This is the "why" skill that follows the "what" from /monitor-ai-quality.
The user will provide one of:
/monitor-ai-quality first, then come back with specific findingsCall Amplitude:get_agent_analytics_schema with include: ["filter_options"] to discover valid agent names, tool names, and topic values. Then call Amplitude:query_agent_analytics_sessions with appropriate filters:
agentNames: ["<agent>"], hasTaskFailure: truetoolNames: ["<tool>"], hasTaskFailure: truehasTechnicalFailure: truemaxQualityScore: 0.4maxSentimentScore: 0.4 or hasNegativeFeedback: trueminCostUsd: <threshold>minDurationMs: <threshold>primaryTopics: ["<topic>"] or use topicClassifications for model-specific filteringUse responseFormat: "concise", limit: 20, and sort by "-session_start" to get recent examples. Select the 3-5 most representative sessions for deep investigation.
Call Amplitude:query_agent_analytics_sessions with searchQuery: "<email or user ID>" to find their sessions. If they reported a specific timeframe, add startDate/endDate. Pick the session(s) that match the complaint.
For each session being investigated (max 3-5 sessions), run these in parallel per session:
Full session detail. Call Amplitude:query_agent_analytics_sessions with sessionIds: ["<id>"], responseFormat: "detailed". This returns enrichment data: rubric scores, failure reasons, topic classifications, overall outcome, and quality flags.
Conversation transcript. Call Amplitude:get_agent_analytics_conversation with sessionId: "<id>", includeCategories: true. Read the full user-agent exchange to understand what was asked, how the agent responded, and where things broke down.
Execution trace. Call Amplitude:query_agent_analytics_spans with sessionId: "<id>". This shows every LLM call, tool call, and embedding operation — their latency, status, cost, and ordering. Look for:
status: "ERROR" — direct failuresWith conversation + trace + enrichment data, build the diagnosis:
Classify the failure type:
Determine scope: Is this a one-off or systemic?
Amplitude:query_agent_analytics_sessions with groupBy: ["agent_name"] or groupBy: ["primary_topic"] to see if failures cluster.Amplitude:query_agent_analytics_sessions with the same agent and time window to check if similar failures exist.Find the trigger: What changed?
Amplitude:query_agent_analytics_spans with groupBy: ["tool_name"]If the root cause isn't clear from the session data alone:
Search conversations. Call Amplitude:search_agent_analytics_conversations with keywords from the error or topic to find other sessions with the same issue. This surfaces patterns the session-level queries might miss.
Check tool/model health. Call Amplitude:query_agent_analytics_spans with groupBy: ["tool_name"] or groupBy: ["model_name"] over the relevant time window. Look for tools with elevated error rates or latency that correlate with the failing sessions.
Structure the output as a root cause analysis.
Required sections:
Investigation summary (2-3 sentences): What was investigated, what was found, and the severity. Written as a headline for the team.
Sessions examined: A compact table of the sessions investigated:
| Session ID | Agent | Outcome | Quality | Sentiment | Failure Type |
|------------|-------|---------|---------|-----------|--------------|
| [id] | [name] | [outcome] | [score] | [score] | [type or —] |
Root cause (1 paragraph): The primary explanation for what went wrong. Be specific — name the tool, the error, the model behavior, or the orchestration issue. Include evidence from the conversation and trace.
Execution trace highlights (for the most illustrative session): Walk through the key spans showing the failure path:
Conversation excerpt (if revealing): Quote the 2-3 most relevant turns showing where the agent failed the user. Keep it brief.
Scope assessment: One-off vs. systemic. How many sessions are affected? Is it getting worse?
Recommended fixes (2-4 numbered items): Concrete actions. Examples:
Follow-on prompt: Offer next steps — "Want me to check if this tool timeout affects other agents, search for similar user complaints, or monitor this pattern over the next few days?"
User says: "What happened in session abc-123?"
Actions:
User says: "Why are Chart Agent sessions failing?"
Actions:
User says: "A customer said our AI gave them wrong data yesterday"
Actions:
The session may be from a different project, or outside the data retention window. Ask the user to confirm the project and check if the session ID is correct.
Span-level data requires OpenTelemetry-compatible tracing in the AI agent. Report what's available from the session and conversation level and note that span data would help narrow the root cause.
Don't try to investigate more than 5 sessions in detail. Instead, use groupBy on query_agent_analytics_sessions to find the common pattern, then deep-dive into 2-3 representative examples.