From curry-train
Find a near-optimal learning rate by sweeping LR exponentially over a few hundred mini-batches and watching where the loss starts to diverge — the Leslie Smith "LR range test". Activate when the user asks "what learning rate should I use", "lr finder", "lr range test", "calibrate the learning rate before training", or after Stage 2 sanity checks pass and they're ready to commit compute.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
A 2–5 minute experiment that gives you a defensible LR before you launch a multi-hour run. Cheaper than asking Optuna for an LR, and produces a single number plus a curve you can reason about.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
A 2–5 minute experiment that gives you a defensible LR before you launch a multi-hour run. Cheaper than asking Optuna for an LR, and produces a single number plus a curve you can reason about.
"What learning rate is too small (training won't move) and what learning rate is too large (training diverges) for this exact configuration?"
Answer with a curve, then pick from the curve.
1e-7).r so that the LR sweeps log-uniformly from 1e-7 to 1.0 over ~500–1000 steps.The resulting curve has three regions:
Pick the LR at the point of steepest descent, then divide by 10 — that's a robust starting point.
A common alternative: pick the LR one order of magnitude below where the loss starts to climb again.
The "steepest descent" point is where the optimizer makes the most progress per step. Dividing by 10 gives margin for stochasticity (a single LR test is one realization; the true optimum has variance). The divide-by-10 rule is empirical but widely robust.
Lives at template/curry_train/prevalidate/lr_range.py. Sketch:
def lr_range_test(model, loader, optimizer, loss_fn, *,
lr_start=1e-7, lr_end=1.0, num_steps=500,
smooth_beta=0.98):
"""Run a single pass over the data with exponentially increasing LR.
Returns (lrs, losses_smoothed). Caller plots and picks.
"""
mult = (lr_end / lr_start) ** (1 / num_steps)
lr = lr_start
avg_loss = 0.0
lrs, losses = [], []
it = iter(loader)
for step in range(num_steps):
batch = next(it)
for g in optimizer.param_groups: g["lr"] = lr
optimizer.zero_grad()
loss = loss_fn(model(batch.x), batch.y)
loss.backward()
optimizer.step()
avg_loss = smooth_beta * avg_loss + (1 - smooth_beta) * loss.item()
smoothed = avg_loss / (1 - smooth_beta ** (step + 1))
if step > 0 and smoothed > 4 * min(losses, default=smoothed):
break # diverged; stop early
lrs.append(lr); losses.append(smoothed)
lr *= mult
return lrs, losses
Confirm the user is past Stage 2 (overfit batch passes, init loss check passes). If not, do that first — LR range test on a broken pipeline gives a useless curve.
Run the test on a representative slice of data (~500 batches). Use the same model, optimizer, loss, and dataset that the real run will use, including augmentation and weight decay.
Plot loss vs LR on a log-x axis. If the user is in a notebook, render the plot; if CLI, save to runs/lr_range/<timestamp>.png.
Pick the LR at the steepest descent and divide by 10. Sanity check: this should be within 2× of typical published LRs for the architecture (Adam: ~3e-4 for transformers, ~1e-3 for small nets). If wildly different, suspect a bug.
Tell the user the picked LR and warn: this is a starting point, not a final answer. After running with it for ~500 steps, they may want to fine-tune ± 2× via a small Optuna study (stage4-optuna-integration).
skills/stage4-optuna-integration — refine the LR around the picked starting point.skills/stage5-warmup-cosine — wrap the picked LR in a schedule for the real run.template/curry_train/prevalidate/lr_range.py — reference implementation.