Skill

stage3-lr-range-test

Find a near-optimal learning rate by sweeping LR exponentially over a few hundred mini-batches and watching where the loss starts to diverge — the Leslie Smith "LR range test". Activate when the user asks "what learning rate should I use", "lr finder", "lr range test", "calibrate the learning rate before training", or after Stage 2 sanity checks pass and they're ready to commit compute.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A 2–5 minute experiment that gives you a defensible LR before you launch a multi-hour run. Cheaper than asking Optuna for an LR, and produces a single number plus a curve you can reason about.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 3 · Pre-validate · LR range test

A 2–5 minute experiment that gives you a defensible LR before you launch a multi-hour run. Cheaper than asking Optuna for an LR, and produces a single number plus a curve you can reason about.

Stage question

"What learning rate is too small (training won't move) and what learning rate is too large (training diverges) for this exact configuration?"

Answer with a curve, then pick from the curve.

The method (Leslie Smith, 2015)

Start with a tiny LR (e.g. 1e-7).
After each mini-batch, multiply LR by a constant r so that the LR sweeps log-uniformly from 1e-7 to 1.0 over ~500–1000 steps.
Record loss after each step.
Plot loss vs LR on a log-x axis.

The resulting curve has three regions:

Flat at low LR (loss doesn't move).
Steepest descent in the middle (loss decreases fastest).
Divergence at high LR (loss spikes upward).

Pick the LR at the point of steepest descent, then divide by 10 — that's a robust starting point.

A common alternative: pick the LR one order of magnitude below where the loss starts to climb again.

Why this works

The "steepest descent" point is where the optimizer makes the most progress per step. Dividing by 10 gives margin for stochasticity (a single LR test is one realization; the true optimum has variance). The divide-by-10 rule is empirical but widely robust.

Recommended implementation

Lives at template/curry_train/prevalidate/lr_range.py. Sketch:

def lr_range_test(model, loader, optimizer, loss_fn, *,
                  lr_start=1e-7, lr_end=1.0, num_steps=500,
                  smooth_beta=0.98):
    """Run a single pass over the data with exponentially increasing LR.

    Returns (lrs, losses_smoothed). Caller plots and picks.
    """
    mult = (lr_end / lr_start) ** (1 / num_steps)
    lr = lr_start
    avg_loss = 0.0
    lrs, losses = [], []
    it = iter(loader)
    for step in range(num_steps):
        batch = next(it)
        for g in optimizer.param_groups: g["lr"] = lr
        optimizer.zero_grad()
        loss = loss_fn(model(batch.x), batch.y)
        loss.backward()
        optimizer.step()
        avg_loss = smooth_beta * avg_loss + (1 - smooth_beta) * loss.item()
        smoothed = avg_loss / (1 - smooth_beta ** (step + 1))
        if step > 0 and smoothed > 4 * min(losses, default=smoothed):
            break  # diverged; stop early
        lrs.append(lr); losses.append(smoothed)
        lr *= mult
    return lrs, losses

Procedure when assisting a user

Confirm the user is past Stage 2 (overfit batch passes, init loss check passes). If not, do that first — LR range test on a broken pipeline gives a useless curve.
Run the test on a representative slice of data (~500 batches). Use the same model, optimizer, loss, and dataset that the real run will use, including augmentation and weight decay.
Plot loss vs LR on a log-x axis. If the user is in a notebook, render the plot; if CLI, save to runs/lr_range/<timestamp>.png.
Pick the LR at the steepest descent and divide by 10. Sanity check: this should be within 2× of typical published LRs for the architecture (Adam: ~3e-4 for transformers, ~1e-3 for small nets). If wildly different, suspect a bug.
Tell the user the picked LR and warn: this is a starting point, not a final answer. After running with it for ~500 steps, they may want to fine-tune ± 2× via a small Optuna study (stage4-optuna-integration).

Boundaries

The LR found is for the optimizer + batch size + model + dataset combination the user tested. Change any of those, redo.
Adaptive optimizers (Adam, AdamW) have a relatively wide "good" LR range; SGD has a much narrower one and benefits more from this test.
Long-context or very-large-batch training may require a re-test with the actual batch size — small-batch LR doesn't transfer linearly to giant batches.
For warmup schedules, the picked LR is the peak LR (after warmup). Warmup itself doesn't affect the test.

Common mistakes

Running with a 1-batch loader → curve has no signal. Use ≥ 500 batches.
Picking the LR at minimum loss → that's already past steepest descent. Pick at steepest descent and divide.
Not divide by 10 → train will be on the edge of divergence and unstable.
Running with frozen pretrained weights → the "training" is mostly the head, and the LR for head + body may differ.

skills/stage4-optuna-integration — refine the LR around the picked starting point.
skills/stage5-warmup-cosine — wrap the picked LR in a schedule for the real run.
Smith, L. N. (2015). "Cyclical Learning Rates for Training Neural Networks". arXiv:1506.01186.
template/curry_train/prevalidate/lr_range.py — reference implementation.

stage3-lr-range-test

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage3-lr-range-test

Tool Access

Preview

SKILL.md

Stage 3 · Pre-validate · LR range test

Stage question

The method (Leslie Smith, 2015)

Why this works

Recommended implementation

Procedure when assisting a user

Boundaries

Common mistakes

Related

Similar Skills

Help us improve

Stage 3 · Pre-validate · LR range test

Stage question

The method (Leslie Smith, 2015)

Why this works

Recommended implementation

Procedure when assisting a user

Boundaries

Common mistakes

Related