From test-engineering
Structured event logging for agent team orchestration with non-blocking writes to .claude/telemetry-[timestamp].log
npx claudepluginhub issacchaos/local-marketplace --plugin test-engineeringThis skill uses the workspace's default tool permissions.
**Version**: 1.0.0
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Version: 1.0.0 Category: Infrastructure Purpose: Non-blocking structured event logging for agent team observability
Used By: Team Orchestration SKILL, team coordinators, specialist agents
The Telemetry Skill provides comprehensive observability into agent team orchestration through structured event logging. It captures agent lifecycle, coordination decisions, progress metrics, test execution, and resource utilization in machine-parseable log files. Telemetry is opt-in by default and designed to be completely non-blocking - failures NEVER halt execution.
Five Event Types (REQ-F-23 to REQ-F-27):
Non-Blocking Design (REQ-NF-6):
Real-Time Logging (REQ-NF-3):
Opt-In Configuration (REQ-F-22):
Function Signature:
def log_telemetry(event_type: str, agent_id: str, status: str, metadata: dict,
project_root: str, team_name: str = None) -> dict
Parameters:
event_type (string): Event type - one of: lifecycle, coordination, progress, test, resourceagent_id (string): Unique agent identifier (e.g., write-agent-1, testing-coordinator)status (string): Event-specific status (e.g., spawned, completed, plan_approved)metadata (dict): Event-specific metadata (see event-types.md for schemas)project_root (string): Absolute path to project rootteam_name (string, optional): Team name for log file initializationReturns:
{
"success": boolean, # Always True (even if write failed)
"log_file": string | None, # Path to log file (if write succeeded)
"warning": string | None # Warning message (if write failed)
}
Guarantees:
success: True to prevent halting execution# Coordinator spawns write-agent
agent_id = "write-agent-1"
parent_id = "testing-coordinator"
depth = 2
# Log telemetry
log_telemetry(
event_type="lifecycle",
agent_id=agent_id,
status="spawned",
metadata={
"parent": parent_id,
"depth": depth,
"agent_type": "agents/write-agent.md",
"team_id": "testing-parallel"
},
project_root=project_root,
team_name="testing-parallel"
)
# Continue execution (telemetry failure doesn't halt)
Logged Event:
2026-02-13T14:30:16.000Z | lifecycle | write-agent-1 | spawned | {"parent":"testing-coordinator","depth":2,"agent_type":"agents/write-agent.md","team_id":"testing-parallel"}
# Coordinator proposes execution plan
plan_summary = "3 parallel write-agents for 12 test targets"
batches = 3
total_targets = 12
# Log telemetry
log_telemetry(
event_type="coordination",
agent_id="testing-coordinator",
status="plan_proposed",
metadata={
"plan_summary": plan_summary,
"batches": batches,
"total_targets": total_targets,
"approval_required": True
},
project_root=project_root,
team_name="testing-parallel"
)
# Present plan to user (telemetry logged before approval gate)
Logged Event:
2026-02-13T14:30:01.456Z | coordination | testing-coordinator | plan_proposed | {"plan_summary":"3 parallel write-agents for 12 test targets","batches":3,"total_targets":12,"approval_required":true}
# Write-agent reports intermediate progress
tests_written = 2
current_file = "test_user_service.py"
percent_complete = 40
# Log telemetry
log_telemetry(
event_type="progress",
agent_id="write-agent-1",
status="update",
metadata={
"tests_written": tests_written,
"current_file": current_file,
"percent_complete": percent_complete
},
project_root=project_root
)
# Continue writing tests (telemetry is background activity)
Logged Event:
2026-02-13T14:30:20.000Z | progress | write-agent-1 | update | {"tests_written":2,"current_file":"test_user_service.py","percent_complete":40}
# Execute-agent finishes running tests
passed = 12
failed = 3
skipped = 0
duration_seconds = 8.5
# Log telemetry
log_telemetry(
event_type="test",
agent_id="execute-agent",
status="execution_complete",
metadata={
"passed": passed,
"failed": failed,
"skipped": skipped,
"duration_seconds": duration_seconds,
"pass_rate": passed / (passed + failed)
},
project_root=project_root
)
# Continue with validation phase
Logged Event:
2026-02-13T14:30:48.500Z | test | execute-agent | execution_complete | {"passed":12,"failed":3,"skipped":0,"duration_seconds":8.5,"pass_rate":0.8}
# Resource manager updates queue status
active_agents = 5
queued_agents = 2
max_agents = 5
# Log telemetry
log_telemetry(
event_type="resource",
agent_id="testing-coordinator",
status="queue_status",
metadata={
"active_agents": active_agents,
"queued_agents": queued_agents,
"max_agents": max_agents
},
project_root=project_root
)
# Continue spawning agents from queue
Logged Event:
2026-02-13T14:31:05.789Z | resource | testing-coordinator | queue_status | {"active_agents":5,"queued_agents":2,"max_agents":5}
# Agent completes and reports token usage
agent_id = "write-agent-1"
input_tokens = 1500
output_tokens = 800
cache_read_tokens = 500
cache_write_tokens = 200
# Log telemetry
log_telemetry(
event_type="resource",
agent_id=agent_id,
status="token_usage",
metadata={
"agent_id": agent_id,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"total_tokens": input_tokens + output_tokens,
"cache_read_tokens": cache_read_tokens,
"cache_write_tokens": cache_write_tokens,
"model": "sonnet"
},
project_root=project_root
)
# Continue with next agent
Logged Event:
2026-02-13T14:30:30.000Z | resource | write-agent-1 | token_usage | {"agent_id":"write-agent-1","input_tokens":1500,"output_tokens":800,"total_tokens":2300,"cache_read_tokens":500,"cache_write_tokens":200,"model":"sonnet"}
Telemetry is disabled by default. Users must opt-in using one of these methods:
Method 1: Environment Variable
# Enable telemetry
export DANTE_TELEMETRY=1
# Run command with telemetry
/team-run testing-parallel src/
Method 2: Configuration File
// .claude/dante-config.json
{
"telemetry_enabled": true
}
Method 3: Team Definition Override
# teams/testing-parallel.md
---
telemetry_enabled: true
# ... other team fields
---
Configuration Precedence (highest to lowest):
telemetry_enabled: true)DANTE_TELEMETRY=1).claude/dante-config.json)Path: {project_root}/.claude/telemetry-{timestamp}.log
Timestamp Format: YYYY-MM-DDTHH-MM-SS (ISO 8601 with dashes for Windows compatibility)
Example:
D:/dev/dante_plugin/.claude/telemetry-2026-02-13T14-30-00.log
Session-Based Logging:
This skill is composed of two sub-skills:
event-types.md (REQ-F-23 to REQ-F-27, REQ-F-27a):
log-writer.md (REQ-F-21, REQ-F-22, REQ-NF-3, REQ-NF-6):
Format:
[timestamp] | [event_type] | [agent_id] | [status] | [metadata_json]
Implementation:
def log_telemetry(event_type: str, agent_id: str, status: str, metadata: dict,
project_root: str, team_name: str = None) -> dict:
"""
Log telemetry event with structured format.
This is the main entry point for all telemetry logging.
"""
try:
# Check if telemetry is enabled
if not is_telemetry_enabled():
# Silent no-op if disabled
return {'success': True, 'log_file': None, 'warning': None}
# Generate timestamp (ISO 8601 UTC with milliseconds)
timestamp = datetime.now(timezone.utc).isoformat(timespec='milliseconds')
# Serialize metadata to JSON
metadata_json = json.dumps(metadata, separators=(',', ':'))
# Format event (pipe-delimited)
event = f"{timestamp} | {event_type} | {agent_id} | {status} | {metadata_json}"
# Write event to log file (non-blocking)
result = write_telemetry_event(event, project_root, team_name)
return result
except Exception as e:
# CRITICAL: Never raise exception from telemetry
warning_msg = f"Telemetry logging failed (non-critical): {e}"
log_warning(warning_msg)
return {'success': True, 'log_file': None, 'warning': warning_msg}
All telemetry operations are non-blocking:
# Example: Coordinator spawns agent with telemetry
try:
# Spawn agent
agent_id = spawn_agent_via_task_tool(agent_spec)
# Log lifecycle event (non-blocking)
log_telemetry(
event_type="lifecycle",
agent_id=agent_id,
status="spawned",
metadata={"parent": parent_id, "depth": depth},
project_root=project_root
)
# Note: We don't check result - telemetry failure doesn't affect execution
# Continue with coordination logic
assign_task_to_agent(agent_id, task)
except Exception as e:
# If spawning fails, handle that error
# But telemetry errors are already caught and logged internally
handle_spawn_failure(e)
Key Principle: Telemetry is observability, not control flow. Execution proceeds regardless of telemetry success/failure.
Lifecycle Events:
spawned: When agent is created and registeredstart: When agent begins executioncompleted: When agent successfully finishesfailed: When agent fails after retriesCoordination Events:
plan_proposed: Before approval gate (show plan to user)plan_approved/plan_rejected: After user decisiontask_assigned: When coordinator assigns task to agentdependencies_resolved: When agent's dependencies completeProgress Events:
update: Periodically during long-running tasks (e.g., every 5 tests)milestone_reached: At significant milestones (e.g., all parallel agents complete)Test Events:
execution_start: When execute-agent starts running testsexecution_complete: When all tests finishResource Events:
queue_status: When agents spawn/complete (queue changes)timeout_warning: 5 minutes before timeouttimeout: When timeout exceededlimit_reached: When agent/team limit hittoken_usage: When agent completes (if token data available)Complete Workflow (testing-parallel team):
1. Coordinator spawned → lifecycle | testing-coordinator | spawned
2. Plan proposed → coordination | testing-coordinator | plan_proposed
3. User approves → coordination | testing-coordinator | plan_approved
4. Agent 1 spawned → lifecycle | write-agent-1 | spawned
5. Agent 2 spawned → lifecycle | write-agent-2 | spawned
6. Agent 3 spawned → lifecycle | write-agent-3 | spawned
7. Parallel execution starts → coordination | testing-coordinator | parallel_execution_start
8. Agent 1 progress update → progress | write-agent-1 | update
9. Agent 1 completes → lifecycle | write-agent-1 | completed
10. Agent 1 token usage → resource | write-agent-1 | token_usage
11. Agent 2 completes → lifecycle | write-agent-2 | completed
12. Agent 2 token usage → resource | write-agent-2 | token_usage
13. Agent 3 completes → lifecycle | write-agent-3 | completed
14. Agent 3 token usage → resource | write-agent-3 | token_usage
15. Parallel execution complete → coordination | testing-coordinator | parallel_execution_complete
16. Execute agent spawned → lifecycle | execute-agent | spawned
17. Test execution starts → test | execute-agent | execution_start
18. Test execution complete → test | execute-agent | execution_complete
19. Execute agent completes → lifecycle | execute-agent | completed
20. Coordinator completes → lifecycle | testing-coordinator | completed
# If telemetry disabled, log_telemetry() returns immediately
result = log_telemetry(...)
# No warning, no error - just silent no-op
# If telemetry write fails, warning is logged but execution continues
result = log_telemetry(...)
if result['warning']:
# Warning was logged internally, coordinator can optionally check it
log_debug(f"Telemetry warning: {result['warning']}")
# Execution continues regardless
# Telemetry NEVER raises exceptions
try:
log_telemetry(...)
except Exception:
# This will never happen - log_telemetry() catches all exceptions internally
pass
For TASK-006 acceptance:
Event Schema (event-types.md):
[timestamp] | [event_type] | [agent_id] | [status] | [metadata_json]Log Writer (log-writer.md):
.claude/telemetry-[timestamp].log in project root (REQ-F-21).claude/dante-config.json (REQ-F-22)Entry Point (SKILL.md):
File: .claude/telemetry-2026-02-13T14-30-00.log
# Dante Team Orchestration Telemetry Log
# Generated: 2026-02-13T14:30:00.000Z
# Team: testing-parallel
# Telemetry Version: 1.0.0
# Format: [timestamp] | [event_type] | [agent_id] | [status] | [metadata_json]
---
2026-02-13T14:30:01.123Z | lifecycle | testing-coordinator | spawned | {"parent":"test-loop-orchestrator","depth":1,"team_id":"testing-parallel"}
2026-02-13T14:30:01.456Z | coordination | testing-coordinator | plan_proposed | {"plan_summary":"3 parallel write-agents","batches":3,"approval_required":true}
2026-02-13T14:30:15.789Z | coordination | testing-coordinator | plan_approved | {"user_response":"approve","batches":3}
2026-02-13T14:30:16.000Z | lifecycle | write-agent-1 | spawned | {"parent":"testing-coordinator","depth":2}
2026-02-13T14:30:16.100Z | lifecycle | write-agent-2 | spawned | {"parent":"testing-coordinator","depth":2}
2026-02-13T14:30:16.200Z | lifecycle | write-agent-3 | spawned | {"parent":"testing-coordinator","depth":2}
2026-02-13T14:30:16.300Z | coordination | testing-coordinator | parallel_execution_start | {"active_agents":3,"max_agents":5}
2026-02-13T14:30:20.000Z | progress | write-agent-1 | update | {"tests_written":2,"percent_complete":40}
2026-02-13T14:30:30.000Z | lifecycle | write-agent-1 | completed | {"duration_seconds":13.7,"tests_generated":5}
2026-02-13T14:30:30.100Z | resource | write-agent-1 | token_usage | {"agent_id":"write-agent-1","input_tokens":1500,"output_tokens":800,"total_tokens":2300,"model":"sonnet"}
2026-02-13T14:30:35.000Z | lifecycle | write-agent-2 | completed | {"duration_seconds":18.9,"tests_generated":6}
2026-02-13T14:30:35.100Z | resource | write-agent-2 | token_usage | {"agent_id":"write-agent-2","input_tokens":1600,"output_tokens":900,"total_tokens":2500,"model":"sonnet"}
2026-02-13T14:30:32.000Z | lifecycle | write-agent-3 | completed | {"duration_seconds":15.8,"tests_generated":4}
2026-02-13T14:30:32.100Z | resource | write-agent-3 | token_usage | {"agent_id":"write-agent-3","input_tokens":1400,"output_tokens":700,"total_tokens":2100,"model":"sonnet"}
2026-02-13T14:30:35.500Z | coordination | testing-coordinator | parallel_execution_complete | {"total_tests":15,"successful_agents":3,"failed_agents":0}
2026-02-13T14:30:40.000Z | test | execute-agent | execution_start | {"framework":"pytest","test_count":15}
2026-02-13T14:30:48.500Z | test | execute-agent | execution_complete | {"passed":15,"failed":0,"skipped":0,"duration_seconds":8.5}
2026-02-13T14:30:49.500Z | lifecycle | testing-coordinator | completed | {"duration_seconds":48.4}
Spec: .sdd/specs/2026-02-12-agent-team-orchestration.md
Plan: .sdd/plans/2026-02-12-agent-team-orchestration-plan.md
Sub-Skills:
skills/telemetry/event-types.md (Event schemas)skills/telemetry/log-writer.md (Write implementation)Pattern: skills/team-orchestration/team-loader.md (Markdown skill structure)
Last Updated: 2026-02-13 Status: Implementation (TASK-006)