Build observability interfaces for multi-agent systems. Use when monitoring multi-agent execution, tracking agent metrics, implementing logging for parallel agents, or debugging agent workflows.
Design real-time observability interfaces for multi-agent systems. Use when building monitoring dashboards, implementing event logging, or tracking agent metrics and costs during multi-agent orchestration.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install google-ecosystem@melodic-softwareThis skill is limited to using the following tools:
Build observability interfaces for monitoring and measuring multi-agent systems.
Guide the design and implementation of observability layers that provide real-time visibility into multi-agent execution.
Implementation Note: Full observability requires Claude Agent SDK with custom MCP tools and UI components. This skill provides design patterns.
"If you can't measure it, you can't improve it. If you can't measure it, you can't scale it."
| Metric | Purpose | How to Track |
|---|---|---|
| Status | Know state | Agent state enum |
| Context usage | Token consumption | API response |
| Cost | Financial impact | API usage data |
| Tool calls | What it's doing | Hook logging |
| Results | Output verification | Result parsing |
| Duration | Execution time | Timestamps |
| Metric | Purpose | Calculation |
|---|---|---|
| Total agents | Scale | Count active |
| Total duration | End-to-end time | First to last |
| Total cost | Financial total | Sum per-agent |
| Success rate | Reliability | Success / total |
| Coverage | Scope | Files touched |
Real-time status for each agent:
┌─────────────────────────────────────┐
│ scout_1 [EXECUTING] │
├─────────────────────────────────────┤
│ Template: scout-fast │
│ Model: haiku │
│ Context: 12,500 / 100,000 tokens │
│ Cost: $0.05 │
│ Duration: 45s │
│ Tool calls: 15 │
└─────────────────────────────────────┘
Required fields:
Real-time log of all activities:
[10:30:00] scout_1 created (template: scout-fast)
[10:30:01] scout_1 commanded: "Analyze auth module"
[10:30:05] scout_1 Read: src/auth/login.ts
[10:30:08] scout_1 Grep: "password" in src/auth/
[10:30:15] scout_1 completed (duration: 14s)
[10:30:16] scout_1 deleted
Event types:
Track spend per agent and total:
Cost Summary
────────────────────────────────────
scout_1 (haiku) $0.05
scout_2 (haiku) $0.04
builder_1 (sonnet) $0.35
reviewer_1 (sonnet) $0.12
────────────────────────────────────
Total $0.56
Budget remaining $4.44 (89%)
Cost components:
View consumed and produced assets:
Agent: builder_1
Consumed Assets:
├── Scout report (summary)
├── src/auth/middleware.ts
└── package.json
Produced Assets:
├── src/auth/rate-limit.ts (created)
├── src/auth/middleware.ts (modified)
└── tests/rate-limit.test.ts (created)
Summary: "Implemented rate limiting middleware"
Status: completed
Filterable activity history:
Filters: [agent: all] [level: all] [tool: all]
10:30:00 INFO scout_1 Created from template
10:30:01 INFO scout_1 Received command
10:30:05 DEBUG scout_1 Read: src/auth/login.ts (1,200 tokens)
10:30:08 DEBUG scout_1 Grep: found 5 matches
10:30:12 WARN scout_1 Context at 80% capacity
10:30:15 INFO scout_1 Completed successfully
# Event types
class AgentEvent:
timestamp: datetime
agent_id: str
event_type: str # create, command, tool, status, error
details: dict
# Log collector
def log_event(event: AgentEvent):
# Store to database
db.events.insert(event)
# Emit to WebSocket
ws.broadcast(event)
# Update metrics
metrics.update(event)
# WebSocket for live updates
async def agent_status_stream(agent_id):
while agent_active(agent_id):
status = get_agent_status(agent_id)
yield status
await asyncio.sleep(1)
def calculate_cost(usage):
input_cost = usage.input_tokens * MODEL_INPUT_PRICE
output_cost = usage.output_tokens * MODEL_OUTPUT_PRICE
return input_cost + output_cost
Orchestration: Add rate limiting
────────────────────────────────────
Agents: 3 active | 2 complete | 0 error
Cost: $0.56 / $5.00 budget
Progress: ████████░░ 80%
[scout_1] ✓ complete (14s)
[scout_2] ✓ complete (12s)
[builder] ⚡ executing (45s)
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Dashboard │
├─────────────────────────────────────────────────────────────┤
│ Task: Add rate limiting to authentication │
│ Started: 10:30:00 | Duration: 2m 15s | Cost: $0.56 │
├─────────────────────────────────────────────────────────────┤
│ Agent Fleet │ Event Stream │
│ ┌─────────────────────────────────┐ │ [10:32:15] builder │
│ │ scout_1 [✓ complete] │ │ Write: rate-limit │
│ │ scout_2 [✓ complete] │ │ [10:32:10] builder │
│ │ builder [⚡ executing] │ │ Read: middleware │
│ │ reviewer [○ pending] │ │ [10:30:15] scout_2 │
│ └─────────────────────────────────┘ │ completed │
├─────────────────────────────────────────────────────────────┤
│ Cost Breakdown │ Results Summary │
│ haiku: $0.09 │ Files read: 8 │
│ sonnet: $0.47 │ Files written: 3 │
│ Total: $0.56 │ Tests: 5/5 passing │
└─────────────────────────────────────────────────────────────┘
When designing observability, provide:
## Observability Design
### Metrics
**Per-Agent:**
[List with tracking method]
**Aggregate:**
[List with calculation]
### Components
**Agent Cards:** [fields and update frequency]
**Event Stream:** [event types and storage]
**Cost Tracking:** [breakdown and budgets]
**Result Inspector:** [consumed/produced format]
**Log Viewer:** [filters and retention]
### Implementation
**Logging:** [architecture]
**Real-Time:** [WebSocket design]
**Storage:** [database schema]
**UI:** [component specifications]
| Anti-Pattern | Problem | Solution |
|---|---|---|
| No metrics | Flying blind | Track everything |
| Delayed updates | Stale status | Real-time WebSocket |
| No cost tracking | Budget overruns | Per-agent costs |
| Missing logs | Can't debug | Log all events |
| No aggregation | Can't summarize | Calculate totals |
Date: 2025-12-26 Model: claude-opus-4-5-20251101
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.