Help us improve
Share bugs, ideas, or general feedback.
From grimoire
Diagnoses memory leaks in processes (JVM, Node.js) by profiling allocations, analyzing heap dumps, and tracing code paths. Use when RSS grows monotonically or OOM kills occur.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireHow this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:diagnose-memory-leakThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematically identify the source of unbounded memory growth in a process by profiling allocations, analyzing heap contents, and tracing the code path responsible.
Detecting memory leaks through heap dumps, retention analysis, and reference tracing.
Detects memory leaks in Node.js, Python, and JVM apps by analyzing event listeners, closures, unbounded caches, and retained references. Use for troubleshooting memory growth.
Diagnoses memory leaks in JavaScript/Node.js apps using heap snapshots and memlab. Useful for high memory usage or OOM errors.
Share bugs, ideas, or general feedback.
Systematically identify the source of unbounded memory growth in a process by profiling allocations, analyzing heap contents, and tracing the code path responsible.
Adopted by: Netflix (Java heap analysis for streaming services), Google (memory profiling as standard performance practice), all production engineering teams that run long-lived processes Impact: Memory leaks are the #3 cause of production incidents (after hardware and deployment failures) in long-running services; undiagnosed leaks cause 2-4 AM OOM kills that take 20+ minutes to recover; structured diagnosis reduces MTTI from hours to 30 minutes Why best: Memory leaks have many causes (event listener accumulation, cache without eviction, retained closures, circular references, native memory leaks); only systematic profiling pinpoints the specific cause
Sources: Gregg "Systems Performance" 2nd ed. (2020) Ch. 7; Gregg "BPF Performance Tools" Pearson (2019); Oracle JVM GC documentation; Node.js memory profiling guide
Confirm the leak exists — Monitor process memory over time (at least 24-48 hours): ps -o pid,rss,vsz,comm -p <pid> every minute, or use Prometheus process_resident_memory_bytes. A genuine leak shows monotonically increasing RSS that doesn't return to baseline after load decreases. Natural heap growth that stabilizes is not a leak.
Determine the memory region — Distinguish heap vs. non-heap memory growth. JVM: heap (-Xmx bounded) vs. off-heap (Metaspace, direct buffers, native libraries). Node.js: V8 heap vs. native addons. A growing heap is a managed code leak; growing non-heap is a native memory or off-heap allocation leak. Diagnosis tools differ by region.
Enable detailed GC logging (JVM) — Add JVM flags: -Xlog:gc*:file=/tmp/gc.log:time,uptime:filecount=10,filesize=10m. Analyze with GCEasy.io or IBM GC analyzer. A rising old-gen baseline after full GC indicates objects surviving collection that should be collected — classic leak signature. GC logs reveal memory pressure before OOM.
Capture heap snapshots at intervals — JVM: jmap -dump:format=b,file=/tmp/heap1.hprof <pid> at two points 30 minutes apart. Node.js: v8.writeHeapSnapshot() or --inspect + Chrome DevTools heap snapshot. Compare the two snapshots to identify which object types grew. Objects that accumulate between snapshots without being released are leak candidates.
Analyze heap with a profiler — JVM: open heap dump in Eclipse MAT (Memory Analyzer Tool). Use "Leak Suspects" report — MAT identifies objects with high retained heap. Look for: collections (HashMap, ArrayList) growing without bound, listener registries, thread-local variables. Node.js: Chrome DevTools "Comparison" between two heap snapshots shows which object types increased.
Identify the retaining reference chain — In Eclipse MAT: right-click the leaking object → "Path to GC Roots" → "Exclude soft/weak/phantom references". This shows the reference chain keeping the object alive. The root of the chain (a static field, a thread-local, a framework registry) is the leak source. In Node.js DevTools: "Retainers" panel shows the same chain.
Profile allocations in real time — JVM: use async-profiler (./profiler.sh -e alloc -d 60 -f alloc.svg <pid>) to generate an allocation flame graph. Shows which call stacks allocate the most objects — high-frequency allocation of objects that should be short-lived but are retained points to the leak site. Gregg's BPF tools: memleak from bcc-tools traces native memory allocations.
Check common leak sources by runtime — JVM: static collections used as caches without eviction, ClassLoader leaks in hot-reload environments, ThreadLocal variables not removed, listeners registered but never deregistered. Node.js: event emitters with unlimited listeners (emitter.setMaxListeners(0) hides the warning), global caches with no TTL, closures capturing large objects. Go: goroutine leaks (goroutines blocked forever), map entries never deleted.
Reproduce in a controlled environment — Write a test that reproduces the leak: run the suspected code path in a loop and monitor heap. Confirmed if heap grows without bound. This test becomes the regression test — the fix must make this test pass (stable heap under load). Reproducing the leak is a prerequisite for verifying the fix.
Implement and verify the fix — Apply the fix: remove the static reference, add eviction to the cache, deregister the listener on cleanup, use weak references where appropriate. Run the reproduction test for 30 minutes. Verify heap stabilizes at a flat baseline after warmup. Deploy to staging with extended monitoring before promoting to production.