From dotnet-skills
Optimizing .NET allocations/throughput. Span, ArrayPool, ref struct, sealed, stackalloc.
npx claudepluginhub wshaddix/dotnet-skillsThis skill uses the workspace's default tool permissions.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Captures architectural decisions in Claude Code sessions as structured ADRs. Auto-detects choices between alternatives and maintains a docs/adr log for codebase rationale.
Performance-oriented architecture patterns for .NET applications. Covers zero-allocation coding with Span<T> and Memory<T>, buffer pooling with ArrayPool<T>, struct design for performance (readonly struct, ref struct, in parameters), sealed class devirtualization by the JIT, stack-based allocation with stackalloc, and string handling performance. Focuses on the why (performance rationale and measurement) rather than the how (language syntax).
Version assumptions: .NET 8.0+ baseline. Span<T> and Memory<T> are available from .NET Core 2.1+ but this skill targets modern usage patterns on .NET 8+.
Out of scope: C# language syntax for Span, records, pattern matching, and collection expressions -- see [skill:dotnet-csharp-modern-patterns]. Coding standards and naming conventions (including sealed class style guidance) -- see [skill:dotnet-csharp-coding-standards]. Microbenchmarking setup and measurement is owned by this epic's companion skill -- see [skill:dotnet-benchmarkdotnet]. Native AOT compilation pipeline and trimming -- see [skill:dotnet-native-aot]. Serialization format performance tradeoffs -- see [skill:dotnet-serialization]. Architecture patterns (caching, resilience, DI) -- see [skill:dotnet-architecture-patterns]. EF Core query optimization -- see [skill:dotnet-efcore-patterns].
Cross-references: [skill:dotnet-benchmarkdotnet] for measuring the impact of these patterns, [skill:dotnet-csharp-modern-patterns] for Span/Memory syntax foundation, [skill:dotnet-csharp-coding-standards] for sealed class style conventions, [skill:dotnet-native-aot] for AOT performance characteristics and trimming impact on pattern choices, [skill:dotnet-serialization] for serialization performance context.
Span<T> provides a safe, bounds-checked view over contiguous memory without allocating. It enables slicing arrays, strings, and stack memory without copying. For syntax details see [skill:dotnet-csharp-modern-patterns]; this section focuses on performance rationale.
// BAD: Substring allocates a new string on each call
public static (string Key, string Value) ParseHeader_Allocating(string header)
{
var colonIndex = header.IndexOf(':');
return (header.Substring(0, colonIndex), header.Substring(colonIndex + 1).Trim());
}
// GOOD: ReadOnlySpan<char> slicing avoids all allocations
public static (ReadOnlySpan<char> Key, ReadOnlySpan<char> Value) ParseHeader_ZeroAlloc(
ReadOnlySpan<char> header)
{
var colonIndex = header.IndexOf(':');
return (header[..colonIndex], header[(colonIndex + 1)..].Trim());
}
Performance impact: for high-throughput parsing (HTTP headers, log lines, CSV rows), Span-based parsing eliminates GC pressure entirely. Measure with [MemoryDiagnoser] in [skill:dotnet-benchmarkdotnet] -- the Allocated column should read 0 B.
Span<T> cannot be used in async methods or stored on the heap (it is a ref struct). Use Memory<T> when you need to:
public async Task<int> ReadAndProcessAsync(Stream stream, Memory<byte> buffer)
{
var bytesRead = await stream.ReadAsync(buffer);
var data = buffer[..bytesRead]; // Memory<T> slicing -- no allocation
return ProcessData(data.Span); // .Span for synchronous processing
}
private int ProcessData(ReadOnlySpan<byte> data)
{
var sum = 0;
foreach (var b in data)
sum += b;
return sum;
}
Large array allocations (>= 85,000 bytes) go directly to the Large Object Heap (LOH), which is only collected in Gen 2 GC -- expensive and causes pauses. Even smaller arrays add GC pressure in hot paths. ArrayPool<T> rents and returns buffers to avoid repeated allocations.
using System.Buffers;
public int ProcessLargeData(Stream source)
{
var buffer = ArrayPool<byte>.Shared.Rent(minimumLength: 81920);
try
{
var bytesRead = source.Read(buffer, 0, buffer.Length);
// IMPORTANT: Rent may return a larger buffer than requested.
// Always use bytesRead or the requested length, never buffer.Length.
return ProcessChunk(buffer.AsSpan(0, bytesRead));
}
finally
{
ArrayPool<byte>.Shared.Return(buffer, clearArray: true);
// clearArray: true zeroes the buffer -- use when buffer held sensitive data
}
}
| Mistake | Impact | Fix |
|---|---|---|
Using buffer.Length instead of requested size | Processes uninitialized bytes beyond actual data | Track requested/actual size separately |
| Forgetting to return the buffer | Pool exhaustion, falls back to allocation | Use try/finally or a using wrapper |
| Returning a buffer twice | Corrupts pool state | Null out the reference after return |
| Not clearing sensitive data | Security leak from pooled buffers | Pass clearArray: true to Return |
The JIT must defensively copy non-readonly structs when accessed via in, readonly fields, or readonly methods to prevent mutation. Marking a struct readonly guarantees immutability, eliminating these copies:
// GOOD: readonly eliminates defensive copies on every access
public readonly struct Point3D
{
public double X { get; }
public double Y { get; }
public double Z { get; }
public Point3D(double x, double y, double z) => (X, Y, Z) = (x, y, z);
// readonly struct: JIT knows this cannot mutate, no defensive copy needed
public double DistanceTo(in Point3D other)
{
var dx = X - other.X;
var dy = Y - other.Y;
var dz = Z - other.Z;
return Math.Sqrt(dx * dx + dy * dy + dz * dz);
}
}
Without readonly, calling a method on a struct through an in parameter forces the JIT to copy the entire struct to protect against mutation. For large structs in tight loops, this eliminates significant overhead.
ref struct types are constrained to the stack. They cannot be boxed, stored in fields, or used in async methods. This enables safe wrapping of Span<T>:
public ref struct SpanLineEnumerator
{
private ReadOnlySpan<char> _remaining;
public SpanLineEnumerator(ReadOnlySpan<char> text) => _remaining = text;
public ReadOnlySpan<char> Current { get; private set; }
public bool MoveNext()
{
if (_remaining.IsEmpty)
return false;
var newlineIndex = _remaining.IndexOf('\n');
if (newlineIndex == -1)
{
Current = _remaining;
_remaining = default;
}
else
{
Current = _remaining[..newlineIndex];
_remaining = _remaining[(newlineIndex + 1)..];
}
return true;
}
}
Use in for large readonly structs passed to methods. The in modifier passes by reference (avoids copying) and prevents mutation:
// in parameter: pass by reference, no copy, no mutation allowed
public static double CalculateDistance(in Point3D a, in Point3D b)
=> a.DistanceTo(in b);
When to use in:
| Struct Size | Recommendation |
|---|---|
| <= 16 bytes | Pass by value (register-friendly, no indirection overhead) |
| > 16 bytes | Use in to avoid copy overhead |
| Any size, readonly struct | in is safe (no defensive copies) |
| Any size, non-readonly struct | Avoid in (defensive copies negate the benefit) |
When a class is sealed, the JIT can replace virtual method calls with direct calls (devirtualization) because no subclass override is possible. This enables further inlining:
// Without sealed: virtual dispatch through vtable
public class OpenService : IProcessor
{
public virtual int Process(int x) => x * 2;
}
// With sealed: JIT devirtualizes + inlines Process call
public sealed class SealedService : IProcessor
{
public int Process(int x) => x * 2;
}
public interface IProcessor { int Process(int x); }
Verify devirtualization with [DisassemblyDiagnoser] in [skill:dotnet-benchmarkdotnet]. See [skill:dotnet-csharp-coding-standards] for the project convention of defaulting to sealed classes.
Devirtualization + inlining eliminates:
In tight loops and hot paths, the cumulative effect is measurable. For framework/library types that are not designed for extension, always prefer sealed.
stackalloc allocates memory on the stack, avoiding GC entirely. Use for small, fixed-size buffers in hot paths:
public static string FormatGuid(Guid guid)
{
// 68 bytes on the stack -- well within safe limits
Span<char> buffer = stackalloc char[68];
guid.TryFormat(buffer, out var charsWritten, "D");
return new string(buffer[..charsWritten]);
}
| Guideline | Rationale |
|---|---|
| Keep allocations small (< 1 KB typical, < 4 KB absolute maximum) | Stack space is limited (~1 MB default on Windows); overflow crashes the process |
| Use constant or bounded sizes only | Runtime-variable sizes risk stack overflow from malicious/unexpected input |
Prefer Span<T> assignment over raw pointer | Span provides bounds checking; raw pointers do not |
| Fall back to ArrayPool for large/variable sizes | Gracefully handle cases that exceed stack budget |
public static string ProcessData(ReadOnlySpan<byte> input)
{
const int stackThreshold = 256;
char[]? rented = null;
Span<char> buffer = input.Length <= stackThreshold
? stackalloc char[stackThreshold]
: (rented = ArrayPool<char>.Shared.Rent(input.Length));
try
{
var written = Encoding.UTF8.GetChars(input, buffer);
return new string(buffer[..written]);
}
finally
{
if (rented is not null)
ArrayPool<char>.Shared.Return(rented);
}
}
This pattern is used throughout the .NET runtime libraries and is the recommended approach for methods that handle both small and large inputs.
Ordinal comparisons are significantly faster than culture-aware comparisons because they avoid Unicode normalization:
// FAST: ordinal comparison (byte-by-byte)
bool isMatch = str.Equals("expected", StringComparison.Ordinal);
bool containsKey = dict.ContainsKey(key); // Dictionary<string, T> uses ordinal by default
// FAST: case-insensitive ordinal (no culture overhead)
bool isMatchIgnoreCase = str.Equals("expected", StringComparison.OrdinalIgnoreCase);
// SLOW: culture-aware comparison (Unicode normalization, linguistic rules)
bool isMatchCulture = str.Equals("expected", StringComparison.CurrentCulture);
Default guidance: Use StringComparison.Ordinal or StringComparison.OrdinalIgnoreCase for internal identifiers, dictionary keys, file paths, and protocol strings. Reserve culture-aware comparison for user-visible text sorting and display.
The CLR interns compile-time string literals automatically. string.Intern() can reduce memory for runtime strings that repeat frequently:
// Intern frequently-repeated runtime strings to share a single instance
var normalized = string.Intern(headerName.ToLowerInvariant());
Caution: Interned strings are never garbage collected. Only intern strings from a bounded, known set (HTTP headers, XML element names). Never intern user input or unbounded data.
| Scenario | Recommended Approach | Why |
|---|---|---|
| 2-3 concatenations | String interpolation $"{a}{b}" | Compiler optimizes to string.Concat |
| Loop concatenation | StringBuilder | Avoids quadratic allocation |
| Known fixed parts | string.Create | Single allocation, Span-based writing |
| High-throughput formatting | Span<char> + TryFormat | Zero-allocation formatting |
// string.Create for single-allocation building
public static string FormatId(int category, int item)
{
return string.Create(11, (category, item), static (span, state) =>
{
state.category.TryFormat(span, out var catWritten);
span[catWritten] = '-';
state.item.TryFormat(span[(catWritten + 1)..], out _);
});
}
Before applying any optimization pattern, measure first. Premature optimization without data leads to complex code with no measurable benefit.
[MemoryDiagnoser] and check the Allocated columnreadonly struct when they are immutable -- without readonly, the JIT generates defensive copies on every in parameter access and readonly field access, silently negating the performance benefit of using structs.StringComparison.Ordinal for internal comparisons -- omitting the comparison parameter defaults to culture-aware comparison, which is slower and produces surprising results for technical strings (file paths, identifiers).[DisassemblyDiagnoser].Performance patterns in this skill are grounded in guidance from:
These sources inform the patterns and rationale presented above. This skill does not claim to represent or speak for any individual.