Help us improve
Share bugs, ideas, or general feedback.
From agentops
Profiles, benchmarks, and optimizes performance hotspots across Go, Python, Node, Rust, and shell runtimes. Detects regressions and produces actionable metrics.
npx claudepluginhub boshu2/agentops --plugin agentopsHow this skill is triggered — by the user, by Claude, or both
Slash command
/agentops:perfThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Quick Ref: `/perf profile <target>` | `/perf bench <target>` | `/perf compare <baseline> <candidate>` | `/perf optimize <target>`
Orchestrates performance profiling and optimization across languages. Diagnoses symptoms, dispatches profiling agents, and manages before/after comparisons for latency, memory, CPU, and bundle issues.
Autonomously optimizes code performance using CodSpeed benchmarks, flamegraph analysis, and iterative improvement. Activates on optimization requests, slow functions, or regression mentions.
Share bugs, ideas, or general feedback.
Quick Ref:
/perf profile <target>|/perf bench <target>|/perf compare <baseline> <candidate>|/perf optimize <target>
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
Performance profiling, benchmarking, regression detection, and optimization recommendations for any language runtime. Produces actionable metrics, not vague advice.
| Mode | Command | Purpose |
|---|---|---|
| Profile | /perf profile <target> | Profile execution, find hotspots |
| Benchmark | /perf bench <target> | Create or run benchmarks |
| Compare | /perf compare <baseline> <candidate> | Compare two runs for regression |
| Optimize | /perf optimize <target> | Analyze and apply optimizations |
If no mode is specified, default to profile.
Identify the language/runtime from file extensions, go.mod, package.json, pyproject.toml, Cargo.toml, or explicit user input. Select the profiling stack:
If the symptom may be host pressure rather than target-code performance, read references/system-pressure-triage.md before benchmarking.
For the repeatable measurement loop, profiler selection, and report metrics, read references/profiling-playbook.md.
| Language | Benchmarking | CPU Profile | Memory Profile | Comparison |
|---|---|---|---|---|
| Go | go test -bench | go tool pprof (cpu) | go tool pprof (alloc) | benchstat |
| Python | pytest-benchmark, timeit | cProfile, py-spy | memory_profiler, tracemalloc | manual diff |
| Node | benchmark.js, vitest bench | --prof, clinic.js | --heap-prof, 0x | manual diff |
| Rust | criterion, cargo bench | cargo flamegraph | heaptrack, DHAT | critcmp |
| Shell | hyperfine | time, strace | N/A | hyperfine built-in |
Check which tools are actually installed. If a preferred tool is missing, fall back to standard-library alternatives before asking the user to install anything.
Run existing benchmarks first. If none exist, create them.
# Go
grep -r "func Benchmark" --include="*_test.go" -l .
# Python
find . -name "test_*" -exec grep -l "benchmark\|@pytest.mark.benchmark" {} +
# Rust
grep -r "#\[bench\]" --include="*.rs" -l .
# Node
find . -name "*.bench.*" -o -name "*.benchmark.*"
If benchmarks exist for the target, run them and capture output. If none exist, write benchmarks covering the target function or module.
Benchmark requirements:
-benchtime=3s -count=5)Save raw baseline output to .agents/perf/baseline-YYYY-MM-DD.txt.
func BenchmarkTargetFunction(b *testing.B) {
// Setup outside the loop
input := prepareInput()
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
TargetFunction(input)
}
}
import pytest
@pytest.mark.benchmark(group="target")
def test_target_benchmark(benchmark):
input_data = prepare_input()
result = benchmark(target_function, input_data)
assert result is not None
Find functions consuming the most CPU time.
Go:
go test -bench=BenchmarkTarget -cpuprofile=cpu.prof ./...
go tool pprof -top cpu.prof
go tool pprof -text -cum cpu.prof # cumulative view
Python:
python -m cProfile -s cumulative target_script.py
# Or for running processes:
py-spy top --pid <PID>
py-spy record -o profile.svg --pid <PID>
Find allocation hotspots and potential leaks.
Go:
go test -bench=BenchmarkTarget -memprofile=mem.prof ./...
go tool pprof -top -alloc_space mem.prof
Python:
python -m memory_profiler target_script.py
# Or with tracemalloc in code:
# tracemalloc.start(); ...; snapshot = tracemalloc.take_snapshot()
Identify blocking operations in hot paths.
After profiling, produce a ranked list:
HOTSPOTS (by cumulative CPU time):
1. pkg/engine.Process 42.3% (1.2s) — main processing loop
2. pkg/engine.parseRecord 28.1% (0.8s) — record deserialization
3. pkg/io.ReadBatch 15.7% (0.45s) — disk reads
Classify each finding by estimated impact:
| Impact | Criteria | Action |
|---|---|---|
| High | >20% of total time or >50% of allocations | Fix immediately |
| Medium | 5-20% of total time or notable allocation waste | Fix in this session |
| Low | <5% of total time, minor inefficiency | Log for later |
Check the profiled code against these known performance killers:
strings.Builder / []byte / io.StringWriter)For each finding, state:
Critical rule: ONE optimization at a time.
For high-effort optimization work, load references/optimization-proof-loop.md before changing code. It defines the proof contract for isomorphic rewrites, benchmark deltas, and keep/revert decisions.
For each optimization:
benchstat (Go) or manual diffperf(<scope>): <description> (+X% throughput) or perf(<scope>): <description> (-X% latency)benchstat, or >5% consistent change for manual comparison)Apply optimizations in this order (highest expected impact first):
Write the report to .agents/perf/YYYY-MM-DD-perf-<target>.md.
# Performance Report: <target>
Date: YYYY-MM-DD
Mode: <profile|bench|compare|optimize>
Language: <detected>
## Summary
<1-2 sentence summary of findings>
## Baseline Metrics
| Metric | Value |
|--------|-------|
| ops/sec | ... |
| ns/op | ... |
| B/op | ... |
| allocs/op | ... |
| p50 latency | ... |
| p95 latency | ... |
| p99 latency | ... |
## Hotspots
<ranked list from Step 2>
## Findings
<classified findings from Step 3>
## Optimizations Applied (if optimize mode)
| Change | Before | After | Improvement |
|--------|--------|-------|-------------|
| ... | ... | ... | +X% |
## After Metrics (if optimize mode)
<same table as baseline, with new values>
## Recommendations
<remaining opportunities not addressed in this session>
When running /perf compare <baseline> <candidate>:
benchstat baseline.txt candidate.txtcritcmp baseline candidateOutput a summary table:
COMPARISON: baseline vs candidate
| Benchmark | Baseline | Candidate | Delta | Verdict |
|-----------|----------|-----------|-------|---------|
| BenchmarkProcess | 1.2ms | 0.9ms | -25% | IMPROVEMENT |
| BenchmarkParse | 450ns | 480ns | +6.7% | REGRESSION |
| BenchmarkIO | 3.1ms | 3.0ms | -3.2% | NOISE |
/complexity first to identify hot paths, then benchmark those.GOMAXPROCS=1), close competing processes.time for wall-clock and manual instrumentation for allocation counts.hyperfine for wall-clock benchmarking across any language.references/perf.feature — Executable spec: profile hotspots with metrics, bench, compare-regression, optimize (soc-qk4b)