Test your multi-agent system with a sample task, showing agent handoffs,...
Tests multi-agent systems by running sample tasks and displaying real-time handoffs, performance metrics, and quality scores.
/plugin marketplace add jeremylongshore/claude-code-plugins-plus-skills/plugin install alerting-rule-creator@claude-code-plugins-plussonnetYou are an expert in multi-agent system testing and observability.
Test a multi-agent orchestration system by:
User invokes: /ai-agents-test "Task description"
Examples:
/ai-agents-test "Build a REST API with authentication"/ai-agents-test "Research best practices for React performance"/ai-agents-test "Debug this authentication error"First check if the multi-agent project exists:
# Check for required files
if [ -f "index.ts" ] && [ -d "agents" ]; then
echo "ā
Multi-agent project found"
else
echo "ā Multi-agent project not found"
echo "š” Run /ai-agents-setup first to create the project"
exit 1
fi
Extract the task from user input:
Default tasks by category:
Create a test runner script:
import { runMultiAgentTask } from './index';
interface TestMetrics {
startTime: number;
endTime?: number;
handoffs: Array<{
from: string;
to: string;
reason: string;
timestamp: number;
}>;
agentsInvolved: Set<string>;
totalDuration?: number;
}
async function testMultiAgentSystem(task: string) {
console.log('š Multi-Agent System Test\n');
console.log('ā'.repeat(60));
console.log(`š Task: ${task}`);
console.log('ā'.repeat(60));
console.log('');
const metrics: TestMetrics = {
startTime: Date.now(),
handoffs: [],
agentsInvolved: new Set()
};
try {
const result = await runMultiAgentTask(task);
metrics.endTime = Date.now();
metrics.totalDuration = metrics.endTime - metrics.startTime;
// Display results
displayResults(result, metrics);
return { success: true, result, metrics };
} catch (error) {
console.error('ā Test failed:', error);
return { success: false, error, metrics };
}
}
function displayResults(result: any, metrics: TestMetrics) {
console.log('\n' + 'ā'.repeat(60));
console.log('š Test Results');
console.log('ā'.repeat(60));
console.log('');
// Success indicator
console.log('ā
Status: Task completed successfully\n');
// Metrics
console.log('ā±ļø Performance Metrics:');
console.log(` Total duration: ${metrics.totalDuration}ms (${(metrics.totalDuration! / 1000).toFixed(2)}s)`);
console.log(` Handoff count: ${metrics.handoffs.length}`);
console.log(` Agents involved: ${metrics.agentsInvolved.size}`);
console.log(` Avg time per handoff: ${(metrics.totalDuration! / Math.max(metrics.handoffs.length, 1)).toFixed(0)}ms`);
console.log('');
// Agent flow
if (metrics.handoffs.length > 0) {
console.log('š Agent Flow:');
const agentFlow = ['coordinator'];
metrics.handoffs.forEach(h => {
if (!agentFlow.includes(h.to)) {
agentFlow.push(h.to);
}
});
console.log(` ${agentFlow.join(' ā ')}`);
console.log('');
}
// Handoff details
if (metrics.handoffs.length > 0) {
console.log('š Handoff Details:');
metrics.handoffs.forEach((handoff, i) => {
const duration = i < metrics.handoffs.length - 1
? metrics.handoffs[i + 1].timestamp - handoff.timestamp
: metrics.endTime! - handoff.timestamp;
console.log(` ${i + 1}. ${handoff.from} ā ${handoff.to}`);
console.log(` Reason: ${handoff.reason}`);
console.log(` Duration: ${duration}ms`);
console.log('');
});
}
// Output summary
console.log('š Output Summary:');
const output = typeof result.output === 'string' ? result.output : JSON.stringify(result.output, null, 2);
const lines = output.split('\n');
if (lines.length > 20) {
console.log(lines.slice(0, 10).join('\n'));
console.log(` ... (${lines.length - 20} more lines) ...`);
console.log(lines.slice(-10).join('\n'));
} else {
console.log(output);
}
console.log('');
// Quality assessment
console.log('šÆ Quality Assessment:');
const qualityScore = assessQuality(result, metrics);
console.log(` Overall score: ${qualityScore.score}/100`);
console.log(` Completeness: ${qualityScore.completeness}`);
console.log(` Efficiency: ${qualityScore.efficiency}`);
console.log(` Coordination: ${qualityScore.coordination}`);
console.log('');
}
function assessQuality(result: any, metrics: TestMetrics) {
let score = 100;
let completeness = 'ā
Excellent';
let efficiency = 'ā
Excellent';
let coordination = 'ā
Excellent';
// Check completeness
const outputLength = JSON.stringify(result.output).length;
if (outputLength < 100) {
score -= 30;
completeness = 'ā ļø Incomplete';
} else if (outputLength < 500) {
score -= 10;
completeness = 'ā
Good';
}
// Check efficiency
const avgHandoffTime = metrics.totalDuration! / Math.max(metrics.handoffs.length, 1);
if (avgHandoffTime > 5000) {
score -= 20;
efficiency = 'ā ļø Slow';
} else if (avgHandoffTime > 3000) {
score -= 10;
efficiency = 'ā
Good';
}
// Check coordination
if (metrics.handoffs.length === 0) {
score -= 20;
coordination = 'ā ļø No handoffs';
} else if (metrics.handoffs.length > 10) {
score -= 10;
coordination = 'ā ļø Too many handoffs';
}
return {
score: Math.max(0, score),
completeness,
efficiency,
coordination
};
}
// CLI interface
const task = process.argv[2];
if (!task) {
console.error('ā Error: Please provide a task to test');
console.log('');
console.log('Usage: ts-node test-runner.ts "Your task description"');
console.log('');
console.log('Examples:');
console.log(' ts-node test-runner.ts "Build a REST API with authentication"');
console.log(' ts-node test-runner.ts "Research React performance best practices"');
console.log('');
process.exit(1);
}
testMultiAgentSystem(task)
.then(({ success }) => {
process.exit(success ? 0 : 1);
})
.catch(error => {
console.error('Fatal error:', error);
process.exit(1);
});
Update index.ts to emit events for testing:
export async function runMultiAgentTask(task: string, options?: {
onHandoff?: (event: HandoffEvent) => void;
onComplete?: (result: any) => void;
verbose?: boolean;
}) {
const verbose = options?.verbose ?? true;
if (verbose) {
console.log(`\nš¤ Starting multi-agent task: ${task}\n`);
}
const handoffs: Array<{
from: string;
to: string;
reason: string;
timestamp: number;
}> = [];
const result = await orchestrate({
agents,
task,
coordinator,
maxDepth: 10,
timeout: 300000,
onHandoff: (event) => {
const handoffData = {
from: event.from,
to: event.to,
reason: event.reason,
timestamp: Date.now()
};
handoffs.push(handoffData);
if (verbose) {
console.log(`\nš Handoff: ${event.from} ā ${event.to}`);
console.log(` Reason: ${event.reason}\n`);
}
options?.onHandoff?.(event);
},
onComplete: (result) => {
if (verbose) {
console.log(`\nā
Task complete!`);
console.log(` Total handoffs: ${handoffs.length}`);
console.log(` Agents: ${new Set(handoffs.flatMap(h => [h.from, h.to])).size}\n`);
}
options?.onComplete?.(result);
}
});
return {
...result,
metrics: {
handoffs,
agentCount: new Set(handoffs.flatMap(h => [h.from, h.to])).size
}
};
}
Run the test:
# Using ts-node
ts-node test-runner.ts "Build a REST API with authentication"
# Or using npm script
npm run test:agents "Build a REST API with authentication"
Show live updates during execution:
š Multi-Agent System Test
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š Task: Build a REST API with authentication
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š Handoff: coordinator ā researcher
Reason: Need to research authentication best practices
š Handoff: researcher ā coder
Reason: Research complete, ready to implement
š Handoff: coder ā reviewer
Reason: Implementation complete, needs review
š Handoff: reviewer ā coordinator
Reason: Review complete, all checks passed
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
š Test Results
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā
ā
Status: Task completed successfully
ā±ļø Performance Metrics:
Total duration: 47823ms (47.82s)
Handoff count: 4
Agents involved: 4
Avg time per handoff: 11956ms
š Agent Flow:
coordinator ā researcher ā coder ā reviewer ā coordinator
š Handoff Details:
1. coordinator ā researcher
Reason: Need to research authentication best practices
Duration: 8234ms
2. researcher ā coder
Reason: Research complete, ready to implement
Duration: 23456ms
3. coder ā reviewer
Reason: Implementation complete, needs review
Duration: 12389ms
4. reviewer ā coordinator
Reason: Review complete, all checks passed
Duration: 3744ms
š Output Summary:
{
"api": "REST API with JWT authentication",
"features": [
"User registration",
"User login",
"JWT token generation",
"Protected routes",
"Token refresh"
],
"security": {
"passwordHashing": "bcrypt",
"tokenExpiry": "1h",
"refreshToken": "7d"
},
"endpoints": [
"POST /api/auth/register",
"POST /api/auth/login",
"POST /api/auth/refresh",
"GET /api/users/me (protected)"
],
"tests": "95% coverage"
}
šÆ Quality Assessment:
Overall score: 95/100
Completeness: ā
Excellent
Efficiency: ā
Excellent
Coordination: ā
Excellent
{
"scripts": {
"test:agents": "ts-node test-runner.ts"
}
}
Create tests/scenarios.json:
{
"scenarios": [
{
"name": "Code Generation",
"task": "Build a REST API with authentication and CRUD operations",
"expectedAgents": ["coordinator", "researcher", "coder", "reviewer"],
"expectedHandoffs": 4,
"maxDuration": 60000
},
{
"name": "Research Task",
"task": "Research best practices for microservices architecture",
"expectedAgents": ["coordinator", "researcher"],
"expectedHandoffs": 2,
"maxDuration": 20000
},
{
"name": "Debug Task",
"task": "Debug JWT authentication failing with 401 errors",
"expectedAgents": ["coordinator", "researcher", "security-auditor"],
"expectedHandoffs": 3,
"maxDuration": 30000
},
{
"name": "Complex Pipeline",
"task": "Design, implement, test, and document a payment processing API",
"expectedAgents": ["coordinator", "api-designer", "coder", "test-writer", "reviewer"],
"expectedHandoffs": 6,
"maxDuration": 120000
}
]
}
If test fails, check:
# 1. Environment variables
if [ -z "$ANTHROPIC_API_KEY" ]; then
echo "ā Error: ANTHROPIC_API_KEY not set"
echo "š” Add your API key to .env file"
exit 1
fi
# 2. Dependencies installed
if [ ! -d "node_modules/@ai-sdk-tools/agents" ]; then
echo "ā Error: Dependencies not installed"
echo "š” Run: npm install"
exit 1
fi
# 3. Agents registered
if ! grep -q "researcher" index.ts; then
echo "ā ļø Warning: Not all agents registered in index.ts"
fi
After test completion, show:
ā
Multi-agent test complete!
š Results:
Status: Success
Duration: 47.8s
Agents: 4 (coordinator, researcher, coder, reviewer)
Handoffs: 4
Quality: 95/100
šÆ Assessment:
ā
All agents coordinated successfully
ā
Task completed within expected time
ā
Output quality meets standards
š” Recommendations:
- System is functioning optimally
- Consider adding more specialized agents for complex tasks
- Average handoff time is excellent (11.9s)
š Full test output saved to: test-results-[timestamp].json
A successful test should have:
Expected performance ranges:
If actual performance exceeds these by 2x, investigate: