Help us improve
Share bugs, ideas, or general feedback.
From odin
Multi-phase performance investigation workflow for establishing baselines, profiling, and making evidence-gated optimization decisions. Use when debugging latency, throughput regressions, or the question "why is this slow?"
npx claudepluginhub outlinedriven/odin-claude-plugin --plugin odinHow this skill is triggered — by the user, by Claude, or both
Slash command
/odin:perf-investigateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Performance work is an extend op-cell: add measured capability without guessing, stacking changes, or laundering anecdotes into evidence. The invariant is: every hypothesis cites source evidence, every optimization has a baseline delta, and every decision is recorded in `.outline/perf/`.
Provides a language-neutral workflow for performance analysis: flamegraph interpretation, allocation tracking, latency profiling, and regression measurement. Activates when latency, throughput, or memory budgets are missed.
Orchestrates cross-language performance profiling and optimization: diagnoses symptoms, dispatches expert agents, benchmarks before/after changes.
Share bugs, ideas, or general feedback.
Performance work is an extend op-cell: add measured capability without guessing, stacking changes, or laundering anecdotes into evidence. The invariant is: every hypothesis cites source evidence, every optimization has a baseline delta, and every decision is recorded in .outline/perf/.
Apply: performance investigation; unexplained latency; throughput or memory regression; profiling a named scenario; creating a versioned baseline; deciding whether an optimization is worth keeping.
NOT: wrong-output bugs; speculative micro-optimizations; no reproducible command; no success metric; user only wants a quick code review; workloads too short to measure without variance control.
Use artifacts only under .outline/perf/:
.outline/perf/investigations/<id>.md.outline/perf/baselines/<version>.json.outline/perf/profiles/<id>/.outline/perf/experiments/<id>/Require all three before measuring:
Record the user's problem statement verbatim. Do not paraphrase it. Create the ledger immediately with the phase order and empty evidence slots.
mkdir -p .outline/perf/investigations .outline/perf/baselines .outline/perf/profiles .outline/perf/experiments
Ledger id: short stable slug such as 2026-06-05-checkout-latency or <version>-<scenario-slug>.
Run sequentially. No parallel benchmarks. Minimum measured duration is 60 seconds per baseline series. Prefer hyperfine; if the command emits structured metrics, preserve both hyperfine output and command metrics.
hyperfine --warmup 3 --min-runs 10 --export-json .outline/perf/baselines/<version>.hyperfine.json '<command>'
Then write exactly one consolidated baseline file:
.outline/perf/baselines/<version>.json
The JSON must contain version, scenario, command, duration/run policy, environment, samples, summary metrics, source artifacts, and the verbatim quote. Overwrite the same version only when intentionally refreshing that version's baseline.
Generate at most five hypotheses. Each MUST cite at least one concrete evidence item before the claim is allowed:
Git-history recipes:
git --no-pager log --since='90 days ago' --format='%h%x09%ad%x09%an%x09%s' --date=short -- <path>
git --no-pager log --format='%h%x09%ad%x09%s' --date=short --all --grep='fix\|perf\|slow\|latency\|timeout\|regress' -- <path>
git shortlog -sn -- <path>
Hypothesis shape:
- id: H1
claim: <one falsifiable sentence>
evidence:
- git:<commit-or-log-command-summary>
- file:<path:line> <observed mechanism>
confidence: low|med|high
predicted_delta: <metric and direction>
test: <single experiment or profiler check>
Certainty: HIGH = direct profile/baseline evidence plus code mechanism; MEDIUM = code mechanism plus correlated git or benchmark signal; LOW = plausible path with weak evidence. LOW can guide profiling, never justify code changes.
Locate entry points and candidate hot files before profiling.
When the repo is indexed, use codegraph first:
codegraph_explore "<scenario terms> entry points hot path handlers"
codegraph_search "<symbol-or-route>"
codegraph_callers "<suspect symbol>"
codegraph_callees "<entry symbol>"
codegraph_impact "<symbol or file>"
Fallback commands:
git grep -nE '<route|handler|command|scenario-keyword|metric-name>' -- ':!node_modules' ':!dist' ':!build'
ast-grep -p 'function $NAME($$$ARGS) { $$$BODY }' <path>
ast-grep -p 'async function $NAME($$$ARGS) { $$$BODY }' <path>
ast-grep -p '$OBJ.$METHOD($$$ARGS)' <path>
For each candidate path, record file:line, symbol, why it is on the path, and which hypothesis it supports or refutes. Keep the set to the 10-15 files most tied to the scenario.
Use the profiler that matches the workload. Capture the profile under .outline/perf/profiles/<id>/. A text summary alone is not enough; produce a flamegraph or flamegraph-compatible artifact and record path(s).
Node.js:
node --cpu-prof --cpu-prof-dir .outline/perf/profiles/<id> --cpu-prof-name cpu.cpuprofile <script-or-command-args>
npx -y speedscope .outline/perf/profiles/<id>/cpu.cpuprofile
Python:
py-spy record --duration 60 --rate 100 --output .outline/perf/profiles/<id>/flame.svg -- <command>
Go:
go test -run '^$' -bench '<bench>' -cpuprofile .outline/perf/profiles/<id>/cpu.pprof ./<pkg>
go tool pprof -svg <binary-or-test-binary> .outline/perf/profiles/<id>/cpu.pprof > .outline/perf/profiles/<id>/flame.svg
Java/JVM:
asprof -d 60 -f .outline/perf/profiles/<id>/flame.html <pid-or-java-command>
Native/Rust/C/C++ on Linux:
perf record -F 99 -g -o .outline/perf/profiles/<id>/perf.data -- <command>
perf script -i .outline/perf/profiles/<id>/perf.data > .outline/perf/profiles/<id>/perf.stacks
If frame pointers or symbols are missing, rebuild in the closest production-equivalent release mode with symbols enabled. Record the profiler command, artifact path, top frames, and file:line mappings.
Test exactly one change per experiment. No bundled optimizations. No follow-up tweak before measurement.
Discipline:
hyperfine comparisons.Comparison command:
hyperfine --warmup 3 --min-runs 10 --export-json .outline/perf/experiments/<id>/<experiment>.json '<baseline-command>' '<experiment-command>'
Revert discipline:
git restore -- <changed-files>
git diff --exit-code -- <changed-files>
Verdicts:
Do not end with raw data. Write a decision:
verdict: continue|keep|reject|stop|rerun
rationale: <why the evidence supports this>
evidence:
- baseline:<path>
- profile:<path>
- experiment:<path>
next: <exact next action or stop condition>
Stop is valid when evidence says the bottleneck is elsewhere, the optimization is not worth complexity, variance is too high, or the measured budget is already met.
Finalize the ledger. Ensure there is one baseline JSON for the version, profile artifacts are named, experiments are linked, and rejected hypotheses are not left ambiguous. If a change is kept, add or update the smallest durable regression guard available for the project; if no guard is feasible, record why.
Before yielding an investigation result:
.outline/perf/investigations/<id>.md with the verbatim quote..outline/perf/baselines/<version>.json and records a >=60s repeated series.file:line evidence.Reference contract: references/investigation.md carries the exact phase contract, hypothesis rules, ledger template, and baseline schema for this skill.