From code-foundations
Guides performance tuning for slow code, timeouts, OOM errors, high CPU/memory via mandatory profiling, 7-step decision tree, and expensive operations reference.
npx claudepluginhub ryanthedev/code-foundationsThis skill uses the workspace's default tool permissions.
**Do not optimize based on intuition -- profile first.**
Guides performance optimization with principles like measure first, find bottlenecks, know when to stop, and evaluate tradeoffs. Useful for slow code, latency, profiling, or benchmarking discussions.
Enforces Rob Pike's 5 rules for measurement-driven performance optimization, preventing premature code changes without profiling data. Activates on speed complaints or optimization requests.
Orchestrates performance optimization workflows using static code analysis to detect bottlenecks like N+1 queries, missing indexes, O(n^2) algorithms, blocking I/O, and memory leaks. Accepts profiling data.
Share bugs, ideas, or general feedback.
Do not optimize based on intuition -- profile first.
No measurement = no optimization. This gate is non-negotiable.
This skill covers single-threaded, single-process code tuning for general-purpose computing.
Not covered (need specialized guidance):
| Myth | Reality |
|---|---|
| "Performance requires complexity" | Simpler code usually runs faster |
| "Clean design sacrifices speed" | Clean design and high performance are compatible |
| "Optimization means adding code" | Optimization often means removing code |
Why: fewer special cases = less code to check, deep classes = more work per call with fewer layer crossings, complicated code does extraneous or redundant work.
| Operation | Cost | Context |
|---|---|---|
| Network (datacenter) | 10-50 us | Tens of thousands of instructions |
| Network (wide-area) | 10-100 ms | Millions of instructions |
| Disk I/O | 5-10 ms | Millions of instructions |
| Flash storage | 10-100 us | Thousands of instructions |
| Dynamic memory allocation | Significant | malloc/new, freeing, GC overhead |
| Cache miss | Few hundred cycles | Often determines overall performance |
| I/O vs memory | ~1000x difference | Batch I/O, avoid I/O in tight loops |
| Interpreted vs compiled | >100x slower | PHP/Python vs C++ |
Each step is a gate. Do NOT skip steps.
1. Is the program correct and complete?
NO -> Make it correct first. STOP optimization.
YES -> Continue
2. Have you measured to find the actual bottleneck?
NO -> Profile/measure first. Do NOT guess.
YES -> Continue
3. Can requirements be relaxed?
YES -> Relax requirements. Done.
NO -> Continue
4. Can design/architecture solve it? (Stage 2: Fundamental Fixes)
YES -> Fix design. Done.
NO -> Continue
5. Can algorithm/data structure solve it?
YES -> Change algorithm. Done.
NO -> Continue
6. Can compiler flags help? (40-59% improvement possible)
YES -> Enable optimizations. Measure.
NO -> Continue
7. Is it in the <4% that causes >50% of runtime?
NO -> Do NOT optimize this code. Find actual hot spot.
YES -> PROCEED with code tuning (see below)
What counts as valid measurement:
Identify WHICH dimension: throughput, latency, memory, or CPU. Different problems need different solutions.
Before code-level changes, check for architectural fixes:
If a fundamental fix exists, implement it with standard design techniques. If not, continue down the tree.
When no fundamental fix is available, redesign the critical path:
Consolidation techniques:
| Technique | Example |
|---|---|
| Encode multiple conditions in single value | Variable that is 0 when any special case applies |
| Single test for multiple cases | Replace 6 individual checks with 1 combined check |
| Combine layers into single method | Critical path handled in one method, not three |
| Merge variables | Combine multiple values into single structure |
Only reached after completing the 7-step decision tree.
1. Save working version (cannot revert without backup)
2. Make ONE change (multiple changes = unmeasurable)
3. Measure improvement (same workload, before/after)
4. Keep if faster, revert if not (no "close enough")
5. Repeat
Logic:
Loops:
Data:
Expressions:
PREREQUISITE: Only apply after profiling confirms the code is in the <4% hot path.
// BEFORE: Compound test every iteration
found = false; i = 0;
while (!found && i < count) {
if (item[i] == target) found = true;
i++;
}
// AFTER: Single test per iteration
item[count] = target; // sentinel
i = 0;
while (item[i] != target) { i++; }
if (i < count) { /* found at position i */ }
// BEFORE: Testing invariant condition every iteration
for (i = 0; i < count; i++) {
if (type == TYPE_A) { processTypeA(item[i]); }
else { processTypeB(item[i]); }
}
// AFTER: Test once outside loop
if (type == TYPE_A) {
for (i = 0; i < count; i++) { processTypeA(item[i]); }
} else {
for (i = 0; i < count; i++) { processTypeB(item[i]); }
}
// BEFORE: Expensive operation
if (Math.sqrt(x) < Math.sqrt(y)) { ... }
// AFTER: Algebraically equivalent (when x,y >= 0)
if (x < y) { ... }
// BEFORE: Column-major access causes page faults
for (column = 0; column < MAX_COLUMNS; column++)
for (row = 0; row < MAX_ROWS; row++)
table[row][column] = BlankTableElement();
// AFTER: Row-major access, sequential memory
for (row = 0; row < MAX_ROWS; row++)
for (column = 0; column < MAX_COLUMNS; column++)
table[row][column] = BlankTableElement();
1. RE-MEASURE to verify measurable performance difference
2. EVALUATE the tradeoff:
- Significant speedup (with data)? -> Keep
- Simpler AND at least as fast? -> Keep
- Neither? -> BACK THEM OUT
| Red Flag | Symptom |
|---|---|
| Premature Optimization | Optimizing without measurement |
| Death by Thousand Cuts | Many small inefficiencies, no single fix helps (5-10x slower) |
| Pass-Through Methods | Identical signature to caller, unnecessary layer crossing |
| Shallow Layers | Multiple layers providing same abstraction |
| Repeated Special Cases | Same conditions checked multiple times |
| Trading maintainability for <10% gain | Complex optimization for minor speedup |
| Threshold/Rule | Value | Source |
|---|---|---|
| Hot spot concentration | <4% causes >50% runtime | Knuth 1971 |
| Failed optimization rate | >50% negligible or negative | CC p.607 |
| Compiler optimization gains | 40-59% improvement possible | CC p.596 |
| I/O vs memory | ~1000x difference | CC p.591 |
PRIORITY ORDER:
1. Correct first
2. Measure (MANDATORY GATE)
3. Relax requirements
4. Design/architecture fix (cache, algorithm, bypass layers)
5. Critical path redesign (minimum code for common case)
6. Compiler flags
7. Code tuning (save -> one change -> measure -> keep/revert)
Never skip steps. Never assume.
Checklist: checklists.md
Output Format:
| Item | Status | Evidence | Location |
|---|---|---|---|
| Measured before tuning? | VIOLATION | No profiler/measurement found | N/A |
| Loop unswitching opportunity | WARNING | Invariant if (debug) inside loop | app.py:142 |
Severity: VIOLATION (clear anti-pattern), WARNING (needs measurement), PASS (no issues)
| After | Next |
|---|---|
| Optimization complete | Verify design not degraded |
| Structure degraded | cc-refactoring-guidance |