From curry-train
Activation checkpointing — recompute forward activations during backward instead of storing them, trading compute for memory. Activate when the user asks "activation checkpointing", "recompute", "OOM during backward", "gradient checkpointing", or needs to fit a larger model into memory.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
Trade ~1.3× forward compute for ~50–80% activation memory savings. The first thing to try when a model OOMs.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Trade ~1.3× forward compute for ~50–80% activation memory savings. The first thing to try when a model OOMs.
Wraps a module so its forward activations are not stored at forward time; on backward, the forward is re-executed to recompute them. Effectively segments the activation memory into checkpoints.
from curry_train.primitives import Recompute
# Wrap a transformer block
block = Recompute(TransformerBlock(...), enabled=cfg.recompute)
# Or as a flag on the model itself
class MyModel(nn.Module):
def __init__(self, ..., recompute: bool = False):
...
self.recompute = recompute
def forward(self, x):
for layer in self.layers:
if self.recompute and self.training:
x = torch.utils.checkpoint.checkpoint(layer, x, use_reentrant=False)
else:
x = layer(x)
return x
Block independently. Best memory/compute trade.stage4-parallel-primitive-intro).eval() mode it's a no-op (PyTorch checkpoint is forward-only).stage2-grad-flow-viz runs; activation hooks fire during recomputation, double-counting.use_reentrant=False.use_reentrant=True (PyTorch's default before 2.0) has subtle bugs with FSDP and BF16. Always use use_reentrant=False unless you know you need otherwise.V1: stub. Reference at template/curry_train/primitives/recompute.py exposes the interface, raises NotImplementedError until populated. PyTorch's torch.utils.checkpoint.checkpoint is the recommended implementation.
skills/stage4-parallel-primitive-intro — the order in which to add primitives; recompute comes first.skills/stage2-grad-flow-viz — disable recompute when probing.torch.utils.checkpoint.checkpoint.