Multi-agent performance profiling for pipeline bottlenecks. TRIGGERS - performance profiling, bottleneck analysis, pipeline optimization.
From quality-toolsnpx claudepluginhub terrylica/cc-skills --plugin quality-toolsThis skill is limited to using the following tools:
references/evolution-log.mdreferences/impact_quantification_guide.mdreferences/integration_report_template.mdreferences/profiling_template.pyBuilds robust backtesting systems for trading strategies, mitigating look-ahead/survivorship biases and transaction costs. For developing algorithms, validating performance, and walk-forward analysis.
Tests web apps with screen readers (VoiceOver, NVDA, JAWS) for compatibility validation, ARIA debugging, form accessibility, and assistive tech support.
Audits websites against WCAG 2.2 with checklists, POUR principles, violation examples, and HTML remediation for accessibility fixes.
Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.
Prescriptive workflow for spawning parallel profiling agents to comprehensively identify performance bottlenecks across multiple system layers. Successfully discovered that QuestDB ingests at 1.1M rows/sec (11x faster than target), proving database was NOT the bottleneck - CloudFront download was 90% of pipeline time.
Use this skill when:
Key outcomes:
Agent 1: Profiling (Instrumentation)
Agent 2: Database Configuration Analysis
Agent 3: Client Library Analysis
Agent 4: Batch Size Analysis
Agent 5: Integration & Synthesis
Parallel Execution (all 5 agents run simultaneously):
Agent 1 (Profiling) → [PARALLEL]
Agent 2 (DB Config) → [PARALLEL]
Agent 3 (Client Library) → [PARALLEL]
Agent 4 (Batch Size) → [PARALLEL]
Agent 5 (Integration) → [PARALLEL - reads tmp/ outputs from others]
Key Principle: No dependencies between investigation agents (1-4). Integration agent synthesizes findings.
Dynamic Todo Management:
Each agent produces:
profile_pipeline.py)
Example Profiling Code:
import time
# Profile multi-stage pipeline
def profile_pipeline():
results = {}
# Phase 1: Download
start = time.perf_counter()
data = download_from_cdn(url)
results["download"] = time.perf_counter() - start
# Phase 2: Extract
start = time.perf_counter()
csv_data = extract_zip(data)
results["extract"] = time.perf_counter() - start
# Phase 3: Parse
start = time.perf_counter()
df = parse_csv(csv_data)
results["parse"] = time.perf_counter() - start
# Phase 4: Ingest
start = time.perf_counter()
ingest_to_db(df)
results["ingest"] = time.perf_counter() - start
# Analysis
total = sum(results.values())
for phase, duration in results.items():
pct = (duration / total) * 100
print(f"{phase}: {duration:.3f}s ({pct:.1f}%)")
return results
Priority Levels:
Impact Reporting Format:
### Recommendation: [Optimization Name] (P0/P1/P2) - [IMPACT LEVEL]
**Impact**: 🔴/🟠/🟡 **Nx improvement**
**Effort**: High/Medium/Low (N days)
**Expected Improvement**: CurrentK → TargetK rows/sec
**Rationale**:
- [Why this matters]
- [Supporting evidence from profiling]
- [Comparison to alternatives]
**Implementation**:
[Code snippet or architecture description]
Integration Agent Responsibilities:
Consensus Criteria:
Input: Performance metric below SLO Output: Problem statement with baseline metrics
Example Problem Statement:
Performance Issue: BTCUSDT 1m ingestion at 47K rows/sec
Target SLO: >100K rows/sec
Gap: 53% below target
Pipeline: CloudFront download → ZIP extract → CSV parse → QuestDB ILP ingest
Directory Structure:
tmp/perf-optimization/
profiling/ # Agent 1
profile_pipeline.py
PROFILING_REPORT.md
questdb-config/ # Agent 2
CONFIG_ANALYSIS.md
python-client/ # Agent 3
CLIENT_ANALYSIS.md
batch-size/ # Agent 4
BATCH_ANALYSIS.md
MASTER_INTEGRATION_REPORT.md # Agent 5
Agent Assignment:
IMPORTANT: Use single message with multiple Task tool calls for true parallelism
Example:
I'm going to spawn 5 parallel investigation agents:
[Uses Task tool 5 times in a single message]
- Agent 1: Profiling
- Agent 2: QuestDB Config
- Agent 3: Python Client
- Agent 4: Batch Size
- Agent 5: Integration (depends on others completing)
Execution:
# All agents run simultaneously (user observes 5 parallel tool calls)
# Each agent writes to its own tmp/ subdirectory
# Integration agent polls for completed reports
Progress Tracking:
Completion Criteria:
Report Structure:
# Master Performance Optimization Integration Report
## Executive Summary
- Critical discovery (what is/isn't the bottleneck)
- Key findings from each agent (1-sentence summary)
## Top 3 Recommendations (Consensus)
1. [P0 Optimization] - HIGHEST IMPACT
2. [P1 Optimization] - HIGH IMPACT
3. [P2 Optimization] - QUICK WIN
## Agent Investigation Summary
### Agent 1: Profiling
### Agent 2: Database Config
### Agent 3: Client Library
### Agent 4: Batch Size
## Implementation Roadmap
### Phase 1: P0 Optimizations (Week 1)
### Phase 2: P1 Optimizations (Week 2)
### Phase 3: P2 Quick Wins (As time permits)
For each recommendation:
Example Implementation:
# Before optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 47K rows/sec, download=857ms (90%)
# Implement P0 recommendation (concurrent downloads)
# [Make code changes]
# After optimization
uv run python tmp/perf-optimization/profiling/profile_pipeline.py
# Output: 450K rows/sec, download=90ms per symbol * 10 concurrent (90%)
Context: Pipeline achieving 47K rows/sec, target 100K rows/sec (53% below SLO)
Assumptions Before Investigation:
Findings After 5-Agent Investigation:
Top 3 Recommendations:
Impact: Discovered database ingests at 1.1M rows/sec (11x faster than target) - proving database was never the bottleneck
Outcome: Avoided wasting 2-3 weeks optimizing database when download was the real bottleneck
❌ Bad: Profile database only, assume it's the bottleneck ✅ Good: Profile entire pipeline (download → extract → parse → ingest)
❌ Bad: Run Agent 1, wait, then run Agent 2, wait, etc. ✅ Good: Spawn all 5 agents in parallel using single message with multiple Task calls
❌ Bad: "Let's optimize the database config first" (assumption-driven) ✅ Good: Profile first, discover database is only 4% of time, optimize download instead
❌ Bad: Only implement P0 (highest impact, highest effort) ✅ Good: Implement P2 quick wins (1.3x for 4-8 hours effort) while planning P0
❌ Bad: Implement optimization, assume it worked ✅ Good: Re-run profiling script, verify expected improvement achieved
Not applicable - profiling scripts are project-specific (stored in tmp/perf-optimization/)
profiling_template.py - Template for phase-boundary instrumentationintegration_report_template.md - Template for master integration reportimpact_quantification_guide.md - How to assess P0/P1/P2 prioritiesNot applicable - profiling artifacts are project-specific
| Issue | Cause | Solution |
|---|---|---|
| Agents running sequentially | Using separate messages | Spawn all agents in single message with multi-Task |
| Integration report empty | Agent reports not written | Wait for all 4 investigation agents to complete |
| Wrong bottleneck identified | Single-layer profiling | Profile entire pipeline, not just assumed layer |
| Profiling results vary | No warmup runs | Run 3-5 warmup iterations before measuring |
| Memory not profiled | Missing tracemalloc | Add tracemalloc instrumentation to profiling script |
| P0/P1 priority unclear | No impact quantification | Include expected Nx improvement for each finding |
| Consensus missing | Agents not compared | Integration agent must synthesize all 4 reports |
| Re-profile shows no change | Caching effects | Clear caches, restart services before re-profiling |
After this skill completes, reflect before closing the task:
Do NOT defer. The next invocation inherits whatever you leave behind.