From ai-plugins
Investigate a running production application for bugs, performance issues, stability problems, and optimization opportunities by analyzing pod logs and distributed traces. Spawns parallel specialist sub-agents, cross-references memory for recurring issues, and produces a prioritized improvement plan.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-plugins:investigate-appThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyze a running production application by collecting pod logs and Jaeger traces,
Analyze a running production application by collecting pod logs and Jaeger traces, spawning parallel specialist sub-agents to review multiple dimensions simultaneously, and producing a prioritized, actionable improvement plan.
Use this skill when:
/investigate-app <kubectl-log-commands> [--jaeger <endpoint>] [--services <service-list>]
| Argument | Required | Description |
|---|---|---|
kubectl-log-commands | Yes | One or more kubectl commands to retrieve pod logs (e.g., kubectl logs -n mynamespace deploy/myapp --since=1h). Multiple commands can be separated by newlines or semicolons. |
--jaeger <endpoint> | No | Jaeger UI endpoint for trace analysis (e.g., localhost:16686). If omitted, tracing analysis runs in log-only mode. |
--services <service-list> | No | Comma-separated list of service names registered in Jaeger. If omitted, the skill discovers services from Jaeger's API. |
Examples:
/investigate-app kubectl logs -n production deploy/api-server --since=2h \
--jaeger localhost:16686 --services api-server,worker,gateway
/investigate-app kubectl logs -n default -l app=myapp --tail=5000
/investigate-app kubectl logs -n prod deploy/frontend --since=30m ; \
kubectl logs -n prod deploy/backend --since=30m \
--jaeger http://jaeger.monitoring:16686
┌───────────────────────────────────────────────────────────────────────┐
│ INVESTIGATE-APP WORKFLOW │
├───────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ Phase 1: │ Read AGENTS.md, parse arguments, understand │
│ │ Setup & │ the project architecture and conventions │
│ │ Context │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Phase 2: │ Execute kubectl commands, query Jaeger API, │
│ │ Data │ collect raw logs and trace data │
│ │ Collection │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Phase 3: │ Recall known issues from memory to provide │
│ │ Memory │ context to sub-agents and detect recurrences │
│ │ Recall │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼──────────────────────────────────────────────┐ │
│ │ Phase 4: Parallel Analysis │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │ Bug │ │ Perf │ │Stable │ │Tracing │ │ │
│ │ │& Error │ │& Rsrc │ │& Rely │ │Analysis│ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │
│ │ │Logging │ │ UX & │ │Token & │ │Security│ │ │
│ │ │& Obsrv │ │ API │ │ Cost │ │& Compl │ │ │
│ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │
│ └──────┬──────────────────────────────────────────────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Phase 5: │ Merge findings, deduplicate, cross-reference │
│ │ Aggregation │ with memory for recurring issues │
│ │ & Dedup │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Phase 6: │ Prioritized list of improvements with │
│ │ Improvement │ effort estimates, grouped by theme │
│ │ Plan │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ Phase 7: │ Store all findings and plan in memory │
│ │ Memory │ for future investigation sessions │
│ │ Persistence │ │
│ └──────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────────┘
Goal: Understand the project, its architecture, and conventions before analyzing data.
Read AGENTS.md (or equivalent project guidance file) in the repository root.
Read README.md for high-level project context.
Check for existing observability configuration.
Extract kubectl commands from the user input.
kubectl logs command.Extract Jaeger endpoint (if provided).
Extract service list (if provided).
Not all 8 analysis perspectives may be relevant. Determine applicability:
| Perspective | When to include |
|---|---|
| Bug & Error Analysis | Always |
| Performance & Resource Optimization | Always |
| Stability & Reliability | Always |
| Distributed Tracing Analysis | Only when Jaeger endpoint is provided |
| Logging & Observability | Always |
| User Experience & API Quality | When the application serves user-facing APIs or UIs |
| Token & Cost Optimization | When the application integrates with LLM/AI APIs |
| Security & Compliance | Always |
If you cannot determine applicability from project context, include all perspectives and let each sub-agent report "no findings" if the perspective does not apply.
Goal: Gather raw log data and trace information for analysis.
Query the Jaeger API to discover available services:
GET http://{jaeger-endpoint}/api/services
For each service, retrieve recent traces:
GET http://{jaeger-endpoint}/api/traces?service={service-name}&limit=100&lookback=2h
Retrieve the service dependency graph:
GET http://{jaeger-endpoint}/api/dependencies?endTs={now}&lookback=7200000
Identify high-latency and error traces for deeper inspection:
If the user provided a service list, verify all listed services appear in Jaeger. Missing services are an immediate finding for the Tracing Analysis perspective.
Before proceeding to analysis, produce a brief summary:
## Data Collection Summary
- Log sources: {count} deployments across {count} namespaces
- Total log lines: {count}
- Time range: {earliest timestamp} to {latest timestamp}
- Jaeger: {available/not available}
- Services in Jaeger: {list or "N/A"}
- Error traces found: {count}
- Slowest trace duration: {duration or "N/A"}
Goal: Retrieve previously stored findings to provide context to sub-agents and detect recurring issues.
Search memory for previous investigation results related to this project:
Recall queries:
- "investigate-app findings {project-name}"
- "production bugs {project-name}"
- "performance issues {project-name}"
- "stability issues {project-name}"
- "tracing gaps {project-name}"
From memory results, compile a structured list:
## Known Issues from Previous Investigations
1. [{severity}] {title} — Found on {date}, status: {fixed/recurring/unresolved}
2. ...
If no prior findings exist, note: "No prior investigation results found in memory."
This list is passed to every sub-agent so they can flag recurrences.
Goal: Spawn specialist sub-agents, one per applicable perspective, to analyze the collected data concurrently.
For each applicable perspective, construct a task prompt using the Sub-Agent Prompt Template defined in analysis-perspectives.md.
Each sub-agent receives:
Spawn all applicable sub-agents in a single call so they run in parallel. Use read-only sub-agents since they only analyze data, not modify code.
Important considerations:
Wait for all sub-agents to complete and collect their structured findings.
Goal: Merge findings from all perspectives, remove duplicates, and cross-reference with memory.
Collect all findings from all sub-agents into a single list.
When two or more perspectives flag the same underlying issue:
Deduplication signals:
For each finding, check against the known issues list from Phase 3:
⚠️ RECURRING — this was found before and may indicate the fix was incomplete or reverted.Sort the deduplicated findings:
Goal: Transform findings into a prioritized, actionable improvement plan.
For each finding (or group of related findings), create an improvement item:
## Improvement Plan
### Priority 1: {theme} — {title}
**Severity**: {CRITICAL/HIGH/MEDIUM/LOW}
**Recurring**: {Yes — seen N times / No}
**Findings**: {list of finding IDs that contribute to this item}
**Problem:**
{concise description of the issue}
**Evidence:**
{key log lines, trace data, or metrics}
**Proposed fix:**
{specific, actionable steps to resolve — file paths, code changes, config changes}
**Effort estimate:** {small / medium / large}
**Risk if not fixed:** {description of impact if left unaddressed}
---
Order the improvement plan using this priority matrix:
| Priority | Criteria |
|---|---|
| P0 — Immediate | CRITICAL severity OR recurring CRITICAL/HIGH issues |
| P1 — This sprint | HIGH severity, non-recurring |
| P2 — Next sprint | MEDIUM severity with measurable impact |
| P3 — Backlog | LOW severity, nice-to-have improvements |
| P4 — Monitor | INFO-level observations, track but no action needed now |
End the plan with aggregate statistics:
## Investigation Summary
| Metric | Value |
|--------|-------|
| Total findings | {count} |
| Critical | {count} |
| High | {count} |
| Medium | {count} |
| Low | {count} |
| Info | {count} |
| Recurring issues | {count} |
| New issues | {count} |
| Resolved issues (no longer seen) | {count} |
| Perspectives analyzed | {count} of 8 |
| Services covered | {list} |
| Time range analyzed | {range} |
Goal: Store all findings and the improvement plan in memory for future investigation sessions.
For each finding at MEDIUM severity or above, store it in memory with:
Store a summary of the improvement plan including:
For any issues previously in memory that were NOT found in this investigation:
Store a brief record of this investigation run:
investigate-app run: {date}
Project: {project name}
Services analyzed: {list}
Time range: {range}
Findings: {critical}/{high}/{medium}/{low}/{info}
Recurring: {count}
New: {count}
npx claudepluginhub schseba/ai-plugins --plugin ai-pluginsCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.