npx claudepluginhub boshu2/agentops --plugin agentopsThis skill uses the workspace's default tool permissions.
> Quick Ref: `/perf profile <target>` | `/perf bench <target>` | `/perf compare <baseline> <candidate>` | `/perf optimize <target>`
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Quick Ref:
/perf profile <target>|/perf bench <target>|/perf compare <baseline> <candidate>|/perf optimize <target>
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
Performance profiling, benchmarking, regression detection, and optimization recommendations for any language runtime. Produces actionable metrics, not vague advice.
| Mode | Command | Purpose |
|---|---|---|
| Profile | /perf profile <target> | Profile execution, find hotspots |
| Benchmark | /perf bench <target> | Create or run benchmarks |
| Compare | /perf compare <baseline> <candidate> | Compare two runs for regression |
| Optimize | /perf optimize <target> | Analyze and apply optimizations |
If no mode is specified, default to profile.
Identify the language/runtime from file extensions, go.mod, package.json, pyproject.toml, Cargo.toml, or explicit user input. Select the profiling stack:
| Language | Benchmarking | CPU Profile | Memory Profile | Comparison |
|---|---|---|---|---|
| Go | go test -bench | go tool pprof (cpu) | go tool pprof (alloc) | benchstat |
| Python | pytest-benchmark, timeit | cProfile, py-spy | memory_profiler, tracemalloc | manual diff |
| Node | benchmark.js, vitest bench | --prof, clinic.js | --heap-prof, 0x | manual diff |
| Rust | criterion, cargo bench | cargo flamegraph | heaptrack, DHAT | critcmp |
| Shell | hyperfine | time, strace | N/A | hyperfine built-in |
Check which tools are actually installed. If a preferred tool is missing, fall back to standard-library alternatives before asking the user to install anything.
Run existing benchmarks first. If none exist, create them.
# Go
grep -r "func Benchmark" --include="*_test.go" -l .
# Python
find . -name "test_*" -exec grep -l "benchmark\|@pytest.mark.benchmark" {} +
# Rust
grep -r "#\[bench\]" --include="*.rs" -l .
# Node
find . -name "*.bench.*" -o -name "*.benchmark.*"
If benchmarks exist for the target, run them and capture output. If none exist, write benchmarks covering the target function or module.
Benchmark requirements:
-benchtime=3s -count=5)Save raw baseline output to .agents/perf/baseline-YYYY-MM-DD.txt.
func BenchmarkTargetFunction(b *testing.B) {
// Setup outside the loop
input := prepareInput()
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
TargetFunction(input)
}
}
import pytest
@pytest.mark.benchmark(group="target")
def test_target_benchmark(benchmark):
input_data = prepare_input()
result = benchmark(target_function, input_data)
assert result is not None
Find functions consuming the most CPU time.
Go:
go test -bench=BenchmarkTarget -cpuprofile=cpu.prof ./...
go tool pprof -top cpu.prof
go tool pprof -text -cum cpu.prof # cumulative view
Python:
python -m cProfile -s cumulative target_script.py
# Or for running processes:
py-spy top --pid <PID>
py-spy record -o profile.svg --pid <PID>
Find allocation hotspots and potential leaks.
Go:
go test -bench=BenchmarkTarget -memprofile=mem.prof ./...
go tool pprof -top -alloc_space mem.prof
Python:
python -m memory_profiler target_script.py
# Or with tracemalloc in code:
# tracemalloc.start(); ...; snapshot = tracemalloc.take_snapshot()
Identify blocking operations in hot paths.
After profiling, produce a ranked list:
HOTSPOTS (by cumulative CPU time):
1. pkg/engine.Process 42.3% (1.2s) — main processing loop
2. pkg/engine.parseRecord 28.1% (0.8s) — record deserialization
3. pkg/io.ReadBatch 15.7% (0.45s) — disk reads
Classify each finding by estimated impact:
| Impact | Criteria | Action |
|---|---|---|
| High | >20% of total time or >50% of allocations | Fix immediately |
| Medium | 5-20% of total time or notable allocation waste | Fix in this session |
| Low | <5% of total time, minor inefficiency | Log for later |
Check the profiled code against these known performance killers:
strings.Builder / []byte / io.StringWriter)For each finding, state:
Critical rule: ONE optimization at a time.
For each optimization:
benchstat (Go) or manual diffperf(<scope>): <description> (+X% throughput) or perf(<scope>): <description> (-X% latency)benchstat, or >5% consistent change for manual comparison)Apply optimizations in this order (highest expected impact first):
Write the report to .agents/perf/YYYY-MM-DD-perf-<target>.md.
# Performance Report: <target>
Date: YYYY-MM-DD
Mode: <profile|bench|compare|optimize>
Language: <detected>
## Summary
<1-2 sentence summary of findings>
## Baseline Metrics
| Metric | Value |
|--------|-------|
| ops/sec | ... |
| ns/op | ... |
| B/op | ... |
| allocs/op | ... |
| p50 latency | ... |
| p95 latency | ... |
| p99 latency | ... |
## Hotspots
<ranked list from Step 2>
## Findings
<classified findings from Step 3>
## Optimizations Applied (if optimize mode)
| Change | Before | After | Improvement |
|--------|--------|-------|-------------|
| ... | ... | ... | +X% |
## After Metrics (if optimize mode)
<same table as baseline, with new values>
## Recommendations
<remaining opportunities not addressed in this session>
When running /perf compare <baseline> <candidate>:
benchstat baseline.txt candidate.txtcritcmp baseline candidateOutput a summary table:
COMPARISON: baseline vs candidate
| Benchmark | Baseline | Candidate | Delta | Verdict |
|-----------|----------|-----------|-------|---------|
| BenchmarkProcess | 1.2ms | 0.9ms | -25% | IMPROVEMENT |
| BenchmarkParse | 450ns | 480ns | +6.7% | REGRESSION |
| BenchmarkIO | 3.1ms | 3.0ms | -3.2% | NOISE |
/complexity first to identify hot paths, then benchmark those.GOMAXPROCS=1), close competing processes.time for wall-clock and manual instrumentation for allocation counts.hyperfine for wall-clock benchmarking across any language.