Use when monitoring system resources, tracking instance limits, managing resource alerts, and ensuring team capacity is maintained. Trigger with resource checks or limit alerts.
npx claudepluginhub emasoft/emasoft-plugins --plugin emasoft-chief-of-staffThis skill uses the workspace's default tool permissions.
Resource monitoring ensures that the multi-agent team has sufficient capacity to operate effectively. The Chief of Staff tracks system resources, monitors instance limits, and responds to resource alerts before they cause coordination failures or degraded performance.
Monitors CPU, memory, disk, and network resources using bash commands and Node.js scripts. Analyzes usage patterns, detects issues like leaks/bottlenecks, sets alerts, and recommends optimizations.
Generates production checklists for AgentCore agents: IAM scoping, inbound auth (JWT/SigV4), secrets management, cold start optimization, session lifecycle, rate limiting, input validation, quotas.
Designs observability for multi-agent systems with per-agent metrics, aggregate stats, agent cards, and event streams to monitor execution, track costs, log activities, and debug workflows.
Share bugs, ideas, or general feedback.
Resource monitoring ensures that the multi-agent team has sufficient capacity to operate effectively. The Chief of Staff tracks system resources, monitors instance limits, and responds to resource alerts before they cause coordination failures or degraded performance.
Before using this skill, ensure:
| Check Type | Output |
|---|---|
| Memory | Current usage, limit, percentage |
| Agents | Active count, max allowed |
| API calls | Rate, remaining quota |
Resource monitoring is the continuous observation of system capacity and agent instance health. Unlike traditional system monitoring focused on servers, Chief of Staff resource monitoring focuses on the resources that affect agent coordination: context windows, instance counts, message queues, and system capacity.
Key characteristics:
CPU, memory, disk, and network affecting agent operations.
Number of active agents, API rate limits, and concurrency constraints.
Notifications when resources approach or exceed thresholds.
When to use: Regularly (every 15 minutes), before spawning new agents, when performance issues are reported.
Steps: Check CPU usage, check memory availability, check disk space, check network connectivity, report findings.
Related documentation:
When to use: Before spawning agents, when approaching limits, when coordination slows.
Steps: Count active sessions, check API rate limits, verify concurrency headroom, assess scaling needs.
Related documentation:
When to use: When resource thresholds are exceeded, when alerts are triggered, when degradation is detected.
Steps: Identify alert type, assess severity, take immediate action, notify relevant parties, document incident.
Related documentation:
Copy this checklist and track your progress:
# Check CPU usage
cpu_usage=$(top -l 1 | grep "CPU usage" | awk '{print $3}' | sed 's/%//')
# Check available memory
mem_free=$(vm_stat | grep "Pages free" | awk '{print $3}' | sed 's/\.//')
mem_free_mb=$((mem_free * 4096 / 1024 / 1024))
# Check disk space
disk_free=$(df -h / | tail -1 | awk '{print $4}')
echo "CPU: ${cpu_usage}%, Memory Free: ${mem_free_mb}MB, Disk Free: ${disk_free}"
Use the ai-maestro-agents-management skill to list all active sessions and count them.
Compare the active session count against the configured maximum (e.g., 20 sessions). If the count exceeds the limit, log a warning: WARNING: Exceeding recommended session limit.
Verify: the active session count is within the configured limit.
# Resource Alert: High Memory Usage
**Timestamp:** 2025-02-01T10:30:00Z
**Alert Type:** Memory threshold exceeded
**Severity:** WARNING
**Current Value:** 85% memory used
**Threshold:** 80%
## Immediate Actions Taken
1. Identified agents with large context windows
2. Requested context compaction from orchestrator-master
3. Paused new agent spawning
## Resolution
Memory usage dropped to 72% after compaction.
Monitoring continues at increased frequency (5 min interval).
Step-by-step runbooks for executing each resource monitoring operation. Use these when performing the actual procedures described above.
Detailed step-by-step runbook for monitoring CPU, memory, disk, and network resources that affect agent operations.
Detailed step-by-step runbook for tracking active agent sessions, API rate limits, and concurrency constraints to ensure team capacity.
Detailed step-by-step runbook for responding to resource threshold violations with appropriate actions to maintain system health.
Symptoms: Commands fail, metrics unavailable, monitoring gaps.
Solution: Verify command availability, check permissions, use alternative methods, document and alert if persistent.
Symptoms: AI Maestro reports different count than observed.
Solution: Force session registry refresh, verify session names, reconcile discrepancies, update roster.
Symptoms: Resources exceed thresholds but no alerts.
Solution: Verify alert configuration, check monitoring interval, test alert mechanism, review threshold values.
Version: 1.0 Last Updated: 2025-02-01 Target Audience: Emasoft Chief of Staff Agent Difficulty Level: Intermediate