From rust-skills
Guides Rust performance optimization: profile hotspots with criterion/flamegraph, prioritize algorithms/allocations/cache, apply pre-allocation/rayon/Cow<T>.
npx claudepluginhub actionbook/rust-skills --plugin rust-skillsThis skill uses the workspace's default tool permissions.
> **Layer 2: Design Choices**
Guides evidence-based code performance optimization: measure baseline, profile bottlenecks, analyze root causes, implement fixes, validate gains. Covers latency/throughput/memory/CPU metrics and stages.
Guides performance tuning for slow code, timeouts, OOM errors, high CPU/memory via mandatory profiling, 7-step decision tree, and expensive operations reference.
Guides performance optimization with principles like measure first, find bottlenecks, know when to stop, and evaluate tradeoffs. Useful for slow code, latency, profiling, or benchmarking discussions.
Share bugs, ideas, or general feedback.
Layer 2: Design Choices
What's the bottleneck, and is optimization worth it?
Before optimizing:
| Goal | Design Choice | Implementation |
|---|---|---|
| Reduce allocations | Pre-allocate, reuse | with_capacity, object pools |
| Improve cache | Contiguous data | Vec, SmallVec |
| Parallelize | Data parallelism | rayon, threads |
| Avoid copies | Zero-copy | References, Cow<T> |
| Reduce indirection | Inline data | smallvec, arrays |
Before optimizing:
Have you measured?
What's the priority?
What's the trade-off?
To domain constraints (Layer 3):
"How fast does this need to be?"
↑ Ask: What's the performance SLA?
↑ Check: domain-* (latency requirements)
↑ Check: Business requirements (acceptable response time)
| Question | Trace To | Ask |
|---|---|---|
| Latency requirements | domain-* | What's acceptable response time? |
| Throughput needs | domain-* | How many requests per second? |
| Memory constraints | domain-* | What's the memory budget? |
To implementation (Layer 1):
"Need to reduce allocations"
↓ m01-ownership: Use references, avoid clone
↓ m02-resource: Pre-allocate with_capacity
"Need to parallelize"
↓ m07-concurrency: Choose rayon or threads
↓ m07-concurrency: Consider async for I/O-bound
"Need cache efficiency"
↓ Data layout: Prefer Vec over HashMap when possible
↓ Access patterns: Sequential over random access
| Tool | Purpose |
|---|---|
cargo bench | Micro-benchmarks |
criterion | Statistical benchmarks |
perf / flamegraph | CPU profiling |
heaptrack | Allocation tracking |
valgrind / cachegrind | Cache analysis |
1. Algorithm choice (10x - 1000x)
2. Data structure (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization (1.5x - 3x)
5. SIMD/Parallelism (2x - 8x)
| Technique | When | How |
|---|---|---|
| Pre-allocation | Known size | Vec::with_capacity(n) |
| Avoid cloning | Hot paths | Use references or Cow<T> |
| Batch operations | Many small ops | Collect then process |
| SmallVec | Usually small | smallvec::SmallVec<[T; N]> |
| Inline buffers | Fixed-size data | Arrays over Vec |
| Mistake | Why Wrong | Better |
|---|---|---|
| Optimize without profiling | Wrong target | Profile first |
| Benchmark in debug mode | Meaningless | Always --release |
| Use LinkedList | Cache unfriendly | Vec or VecDeque |
Hidden .clone() | Unnecessary allocs | Use references |
| Premature optimization | Wasted effort | Make it work first |
| Anti-Pattern | Why Bad | Better |
|---|---|---|
| Clone to avoid lifetimes | Performance cost | Proper ownership |
| Box everything | Indirection cost | Stack when possible |
| HashMap for small sets | Overhead | Vec with linear search |
| String concat in loop | O(n^2) | String::with_capacity or format! |
| When | See |
|---|---|
| Reducing clones | m01-ownership |
| Concurrency options | m07-concurrency |
| Smart pointer choice | m02-resource |
| Domain requirements | domain-* |