From codspeed
Set up performance benchmarks and CodSpeed harness for a project. Use this skill whenever the user wants to create benchmarks, add performance tests, set up CodSpeed, configure codspeed.yml, integrate a benchmarking framework (criterion, divan, pytest-benchmark, vitest bench, go test -bench, google benchmark), or when the user says 'add benchmarks', 'set up perf tests', 'create a benchmark', 'benchmark this', or wants to measure performance of their code for the first time. Also trigger when the optimize skill needs benchmarks that don't exist yet.
npx claudepluginhub codspeedhq/codspeed --plugin codspeedThis skill uses the workspace's default tool permissions.
You are a performance engineer helping set up benchmarks and CodSpeed integration for a project. Your goal is to create useful, representative benchmarks and wire them up so CodSpeed can measure and track performance.
Autonomously optimize code for performance using CodSpeed benchmarks, flamegraph analysis, and iterative improvement. Use this skill whenever the user wants to make code faster, reduce CPU usage, optimize memory, improve throughput, find performance bottlenecks, or asks to 'optimize', 'speed up', 'make faster', 'reduce latency', 'improve performance', or points at a CodSpeed benchmark result wanting improvements. Also trigger when the user mentions a slow function, a regression, or wants to understand where time is spent in their code.
Creates and runs reliable benchmarks to measure code change impacts on performance, including latency, throughput. Supports Node.js (vitest, tinybench), Python (pytest-benchmark), frontend (Lighthouse CI), with warmup, stats.
Share bugs, ideas, or general feedback.
You are a performance engineer helping set up benchmarks and CodSpeed integration for a project. Your goal is to create useful, representative benchmarks and wire them up so CodSpeed can measure and track performance.
Before writing any benchmark code, understand what you're working with:
Detect the language and build system: Look at the project structure, package files (Cargo.toml, package.json, pyproject.toml, go.mod, CMakeLists.txt), and source files.
Identify existing benchmarks: Check for benchmark files, codspeed.yml, CI workflows mentioning CodSpeed or benchmarks.
Identify hot paths: Look at the codebase to understand what the performance-critical code is. Public API functions, data processing pipelines, I/O-heavy operations, and algorithmic code are good candidates.
Check CodSpeed auth: Ensure codspeed auth login has been run.
Based on the language and what the user wants to benchmark, pick the right harness:
These integrate deeply with CodSpeed and provide per-benchmark flamegraphs, fine-grained comparison, and simulation mode support.
| Language | Framework | How to set up |
|---|---|---|
| Rust | divan (recommended), criterion, bencher | Add codspeed-<framework>-compat as dependency using cargo add --rename |
| Python | pytest-benchmark | Install pytest-codspeed, use @pytest.benchmark or benchmark fixture |
| Node.js | vitest (recommended), tinybench v5, benchmark.js | Install @codspeed/<framework>-plugin, configure in vitest/test config |
| Go | go test -bench | No packages needed — CodSpeed instruments go test -bench directly |
| C/C++ | Google Benchmark | Build with CMake, CodSpeed instruments via valgrind-codspeed |
For any language or when you want to benchmark a whole program (not individual functions):
codspeed exec -m <mode> -- <command> for one-off benchmarkscodspeed.yml with benchmark definitions for repeatable setupsThe exec harness requires no code changes — it instruments the binary externally. This is ideal for:
cargo add divan
cargo add codspeed-divan-compat --rename divan --dev
benches/:// benches/my_bench.rs
use divan;
fn main() {
divan::main();
}
#[divan::bench]
fn bench_my_function() {
// Call the function you want to benchmark
// Use divan::black_box() to prevent compiler optimization
divan::black_box(my_crate::my_function());
}
Cargo.toml:[[bench]]
name = "my_bench"
harness = false
cargo codspeed build -m simulation --bench my_bench
codspeed run -m simulation -- cargo codspeed run --bench my_bench
cargo add criterion --dev
cargo add codspeed-criterion-compat --rename criterion --dev
benches/:use criterion::{criterion_group, criterion_main, Criterion};
fn bench_my_function(c: &mut Criterion) {
c.bench_function("my_function", |b| {
b.iter(|| my_crate::my_function())
});
}
criterion_group!(benches, bench_my_function);
criterion_main!(benches);
Cargo.toml and build/run same as divan.pip install pytest-codspeed
# or
uv add --dev pytest-codspeed
# tests/test_benchmarks.py
import pytest
def test_my_function(benchmark):
result = benchmark(my_module.my_function, arg1, arg2)
# You can still assert on the result
assert result is not None
# Or using the pedantic API for setup/teardown:
def test_with_setup(benchmark):
data = prepare_data()
benchmark.pedantic(my_module.process, args=(data,), rounds=100)
codspeed run -m simulation -- pytest --codspeed
npm install -D @codspeed/vitest-plugin
# or
pnpm add -D @codspeed/vitest-plugin
vitest.config.ts):import { defineConfig } from "vitest/config";
import codspeed from "@codspeed/vitest-plugin";
export default defineConfig({
plugins: [codspeed()],
});
// bench/my.bench.ts
import { bench, describe } from "vitest";
describe("my module", () => {
bench("my function", () => {
myFunction();
});
});
codspeed run -m simulation -- npx vitest bench
No packages needed — CodSpeed instruments go test -bench directly.
// my_test.go
func BenchmarkMyFunction(b *testing.B) {
for i := 0; i < b.N; i++ {
MyFunction()
}
}
codspeed run -m walltime -- go test -bench . ./...
Install Google Benchmark (via CMake FetchContent or system package)
Create benchmark:
#include <benchmark/benchmark.h>
static void BM_MyFunction(benchmark::State& state) {
for (auto _ : state) {
MyFunction();
}
}
BENCHMARK(BM_MyFunction);
BENCHMARK_MAIN();
cmake -B build && cmake --build build
codspeed run -m simulation -- ./build/my_benchmark
For benchmarking whole programs without code changes:
codspeed.yml:$schema: https://raw.githubusercontent.com/CodSpeedHQ/codspeed/refs/heads/main/schemas/codspeed.schema.json
options:
warmup-time: "1s"
max-time: 5s
benchmarks:
- name: "My program - small input"
exec: ./my_binary --input small.txt
- name: "My program - large input"
exec: ./my_binary --input large.txt
options:
max-time: 30s
codspeed run -m walltime
Or for a one-off:
codspeed exec -m walltime -- ./my_binary --input data.txt
Good benchmarks are representative, isolated, and stable. Here are guidelines:
Benchmark real workloads: Use realistic input data and sizes. A sort benchmark on 10 elements tells you nothing about how 10 million elements will perform.
Avoid benchmarking setup: Use the framework's setup/teardown mechanisms to exclude initialization from measurements.
Prevent dead code elimination: Use black_box() (Rust), benchmark.pedantic() (Python), or equivalent to ensure the compiler/runtime doesn't optimize away the work you're measuring.
Cover the critical path: Benchmark the functions that matter most to your users — the ones called frequently or on the hot path.
Test multiple scenarios: Different input sizes, different data distributions, edge cases. Performance characteristics often change with scale.
Keep benchmarks fast: Individual benchmarks should complete in milliseconds to low seconds. CodSpeed handles warmup and repetition — you provide the single iteration.
After setting up:
# For language-specific harnesses
cargo codspeed build -m simulation && codspeed run -m simulation -- cargo codspeed run
# or
codspeed run -m simulation -- pytest --codspeed
# or
codspeed run -m simulation -- npx vitest bench
# etc.
# For exec harness
codspeed run -m walltime
Check the output: You should see a results table and a link to the CodSpeed report.
Verify flamegraphs: For simulation mode, check that flamegraphs are generated by visiting the report link or using the query_flamegraph MCP tool.
Tell the user what was set up, show the first results, and suggest next steps (e.g., adding CI integration, running the optimize skill).