From dotnet-skills
Writing benchmarks. BenchmarkDotNet setup, memory diagnosers, baselines, result analysis.
npx claudepluginhub wshaddix/dotnet-skillsThis skill uses the workspace's default tool permissions.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Microbenchmarking guidance for .NET using BenchmarkDotNet v0.14+. Covers benchmark class setup, memory and disassembly diagnosers, exporters for CI artifact collection, baseline comparisons, and common pitfalls that invalidate measurements.
Version assumptions: BenchmarkDotNet v0.14+ on .NET 8.0+ baseline. Examples use current stable APIs.
Out of scope: Performance-oriented architecture patterns (Span<T>, ArrayPool<T>, sealed class devirtualization) are owned by this epic's companion skill -- see [skill:dotnet-performance-patterns]. C# syntax for modern patterns (records, primary constructors) -- see [skill:dotnet-csharp-modern-patterns]. Coding standards and style conventions -- see [skill:dotnet-csharp-coding-standards]. Native AOT compilation pipeline and performance characteristics -- see [skill:dotnet-native-aot]. Serialization format APIs and round-trip correctness -- see [skill:dotnet-serialization]. Profiling tools (dotnet-counters, dotnet-trace, dotnet-dump) are covered by [skill:dotnet-profiling]. CI benchmark regression detection is covered by [skill:dotnet-ci-benchmarking]. Architecture patterns (caching, resilience) -- see [skill:dotnet-architecture-patterns]. EF Core query optimization -- see [skill:dotnet-efcore-patterns].
Cross-references: [skill:dotnet-performance-patterns] for zero-allocation patterns measured by benchmarks, [skill:dotnet-csharp-modern-patterns] for Span/Memory syntax foundation, [skill:dotnet-csharp-coding-standards] for sealed class style conventions, [skill:dotnet-native-aot] for AOT performance characteristics and benchmark considerations, [skill:dotnet-serialization] for serialization format performance tradeoffs.
<!-- Benchmarks.csproj -->
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="BenchmarkDotNet" Version="0.14.*" />
</ItemGroup>
</Project>
Keep benchmark projects separate from production code. Use a benchmarks/ directory at the solution root.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
[MemoryDiagnoser]
public class StringConcatBenchmarks
{
private readonly string[] _items = Enumerable.Range(0, 100)
.Select(i => i.ToString())
.ToArray();
[Benchmark(Baseline = true)]
public string StringConcat()
{
var result = string.Empty;
foreach (var item in _items)
result += item;
return result;
}
[Benchmark]
public string StringBuilder()
{
var sb = new System.Text.StringBuilder();
foreach (var item in _items)
sb.Append(item);
return sb.ToString();
}
[Benchmark]
public string StringJoin() => string.Join(string.Empty, _items);
}
// Program.cs
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<StringConcatBenchmarks>();
Run in Release mode (mandatory for valid results):
dotnet run -c Release
[MemoryDiagnoser]
public class CollectionBenchmarks
{
[Params(10, 100, 1000)]
public int Size { get; set; }
private int[] _data = null!;
[GlobalSetup]
public void Setup()
{
_data = Enumerable.Range(0, Size).ToArray();
}
[Benchmark(Baseline = true)]
public int ForLoop()
{
var sum = 0;
for (var i = 0; i < _data.Length; i++)
sum += _data[i];
return sum;
}
[Benchmark]
public int LinqSum() => _data.Sum();
}
Tracks GC allocations and collection counts per benchmark invocation. Apply at class level to all benchmarks:
[MemoryDiagnoser]
public class AllocationBenchmarks
{
[Benchmark]
public byte[] AllocateArray() => new byte[1024];
[Benchmark]
public int UseStackalloc()
{
Span<byte> buffer = stackalloc byte[1024];
buffer[0] = 42;
return buffer[0];
}
}
Output columns:
| Column | Meaning |
|---|---|
Allocated | Bytes allocated per operation |
Gen0 | Gen 0 GC collections per 1000 operations |
Gen1 | Gen 1 GC collections per 1000 operations |
Gen2 | Gen 2 GC collections per 1000 operations |
Zero in Allocated column confirms zero-allocation code paths.
Inspects JIT-compiled assembly to verify optimizations (devirtualization, inlining):
[DisassemblyDiagnoser(maxDepth: 2)]
[MemoryDiagnoser]
public class DevirtualizationBenchmarks
{
// sealed enables JIT devirtualization -- verify in disassembly output
// See [skill:dotnet-csharp-coding-standards] for sealed class conventions
[Benchmark]
public int SealedCall()
{
var obj = new SealedService();
return obj.Calculate(42);
}
[Benchmark]
public int VirtualCall()
{
IService obj = new SealedService();
return obj.Calculate(42);
}
}
public interface IService { int Calculate(int x); }
public sealed class SealedService : IService
{
public int Calculate(int x) => x * 2;
}
Use DisassemblyDiagnoser to verify that sealed classes receive devirtualization from the JIT, confirming the performance rationale documented in [skill:dotnet-csharp-coding-standards].
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Exporters.Json;
[MemoryDiagnoser]
[JsonExporterAttribute.Full]
[HtmlExporter]
[MarkdownExporter]
public class CiBenchmarks
{
[Benchmark]
public void MyOperation()
{
// benchmark code
}
}
| Exporter | File | Use Case |
|---|---|---|
JsonExporterAttribute.Full | BenchmarkDotNet.Artifacts/results/*-report-full.json | CI regression comparison (machine-readable) |
HtmlExporter | BenchmarkDotNet.Artifacts/results/*-report.html | Human-readable PR review artifact |
MarkdownExporter | BenchmarkDotNet.Artifacts/results/*-report-github.md | Paste into PR comments |
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Exporters.Json;
using BenchmarkDotNet.Jobs;
var config = ManualConfig.Create(DefaultConfig.Instance)
.AddJob(Job.ShortRun) // fewer iterations for CI speed
.AddExporter(JsonExporter.Full)
.WithArtifactsPath("./benchmark-results");
BenchmarkRunner.Run<CiBenchmarks>(config);
- name: Run benchmarks
run: dotnet run -c Release --project benchmarks/MyBenchmarks.csproj
- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: benchmarks/BenchmarkDotNet.Artifacts/results/
retention-days: 30
Mark one benchmark as the baseline for ratio comparison:
[MemoryDiagnoser]
public class SerializationBenchmarks
{
// Serialization format choice -- see [skill:dotnet-serialization] for API details
private readonly JsonSerializerOptions _options = new()
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
};
private readonly WeatherForecast _data = new()
{
Date = DateOnly.FromDateTime(DateTime.Now),
TemperatureC = 25,
Summary = "Warm"
};
[Benchmark(Baseline = true)]
public string SystemTextJson()
=> System.Text.Json.JsonSerializer.Serialize(_data, _options);
[Benchmark]
public byte[] Utf8Serialization()
=> System.Text.Json.JsonSerializer.SerializeToUtf8Bytes(_data, _options);
}
public record WeatherForecast
{
public DateOnly Date { get; init; }
public int TemperatureC { get; init; }
public string? Summary { get; init; }
}
The Ratio column in output shows performance relative to the baseline (1.00). Values below 1.00 indicate faster than baseline; above 1.00 indicate slower.
Group benchmarks with [BenchmarkCategory] and filter at runtime:
[MemoryDiagnoser]
[GroupBenchmarksBy(BenchmarkLogicalGroupRule.ByCategory)]
public class CategorizedBenchmarks
{
[Benchmark, BenchmarkCategory("Serialization")]
public string JsonSerialize() => "...";
[Benchmark, BenchmarkCategory("Allocation")]
public byte[] ArrayAlloc() => new byte[1024];
}
Run a specific category:
dotnet run -c Release -- --filter *Serialization*
// Run a single benchmark class
BenchmarkRunner.Run<StringConcatBenchmarks>();
// Run all benchmarks in assembly
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
# Run benchmarks matching a pattern
dotnet run -c Release -- --filter *StringBuilder*
# List all available benchmarks without running
dotnet run -c Release -- --list flat
# Dry run (validates setup without full benchmark)
dotnet run -c Release -- --filter *StringBuilder* --job Dry
When benchmarking Native AOT scenarios, the JIT diagnosers are not available (there is no JIT). Use wall-clock time and memory comparisons instead. See [skill:dotnet-native-aot] for AOT compilation setup:
[MemoryDiagnoser]
// Do NOT use DisassemblyDiagnoser with AOT -- no JIT to disassemble
public class AotBenchmarks
{
[Benchmark]
public string SourceGenSerialize()
=> System.Text.Json.JsonSerializer.Serialize(
new { Value = 42 },
AppJsonContext.Default.Options);
}
The JIT may eliminate benchmark code whose result is unused. Always return or consume the result:
// BAD: JIT may eliminate the entire loop
[Benchmark]
public void DeadCode()
{
var sum = 0;
for (var i = 0; i < 1000; i++)
sum += i;
// sum is never used -- JIT removes the loop
}
// GOOD: return the value to prevent elimination
[Benchmark]
public int LiveCode()
{
var sum = 0;
for (var i = 0; i < 1000; i++)
sum += i;
return sum;
}
| Pitfall | Cause | Fix |
|---|---|---|
| Running in Debug mode | No JIT optimizations applied | Always use -c Release |
| Shared mutable state | Benchmarks interfere with each other | Use [IterationSetup] or immutable data |
| Cold-start measurement | First run includes JIT compilation | BenchmarkDotNet handles warmup automatically -- do not add manual warmup |
| Allocations in setup | Setup allocations inflate Allocated column | Use [GlobalSetup] (runs once) vs [IterationSetup] (runs per iteration) |
| Environment noise | Background processes skew results | BenchmarkDotNet detects and warns about environment issues; use Job.MediumRun for noisy environments |
[MemoryDiagnoser]
public class LifecycleBenchmarks
{
private byte[] _data = null!;
[GlobalSetup] // Runs once before all benchmark iterations
public void GlobalSetup() => _data = new byte[1024];
[IterationSetup] // Runs before each benchmark iteration
public void IterationSetup() => Array.Fill(_data, (byte)0);
[Benchmark]
public int Process()
{
// uses _data
return _data.Length;
}
[GlobalCleanup] // Runs once after all iterations
public void GlobalCleanup() { /* dispose resources */ }
}
Prefer [GlobalSetup] over [IterationSetup] unless the benchmark mutates shared state. [IterationSetup] adds overhead that BenchmarkDotNet excludes from timing, but it still affects GC pressure measurement.
dotnet run -c Release. Debug mode disables JIT optimizations and produces meaningless results.[GlobalSetup] not constructor for initialization -- BenchmarkDotNet creates benchmark instances multiple times during a run; constructor code runs repeatedly.[Params] over manual loops for parameterized benchmarks. BenchmarkDotNet runs each parameter combination independently with proper statistical analysis.[JsonExporterAttribute.Full] to produce machine-readable artifacts for regression detection, not just Markdown.