Skill

stage1-preflight-asserts

A canonical set of low-cost assertions to run before any non-trivial training, catching the most common "silent" bugs (zero_grad missed, train/eval mode wrong, wrong tensor shape, label leakage in transforms). Activate when the user asks "what should I check before training", "preflight checks", "is my training set up correctly", or any time a fresh model is about to be trained.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A short, deterministic set of assertions that runs in a few seconds and catches the most common bugs that would otherwise waste hours of compute. These run **before the first real training step**, not as part of normal training.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 1 · Skeleton · Preflight asserts

A short, deterministic set of assertions that runs in a few seconds and catches the most common bugs that would otherwise waste hours of compute. These run before the first real training step, not as part of normal training.

Stage question

"If something is silently wrong with my pipeline, will I know within 10 seconds?"

The eight asserts

Each assert is one function that either passes silently or raises a structured PreflightError with a remediation hint. Implementations live in template/curry_train/infra/preflight.py.

1. `assert_zero_grad_idempotent(model, optimizer)`

Run two consecutive optimizer.zero_grad(set_to_none=True) calls and confirm no parameter has a non-None .grad. Catches optimizers that hold dangling state.

2. `assert_train_eval_mode_clean(model)`

Toggle model.train() and model.eval() once each, confirming model.training ends up where the caller asked. Catches modules that override .train()/.eval() incorrectly (common in custom modules with sub-optimizers).

3. `assert_input_shape_contract(model, dummy_batch)`

Run one forward pass on a dummy_batch and confirm the output shape matches the model's documented contract. Catches silent broadcasting bugs (the most insidious source of wrong-but-not-erroring training).

4. `assert_init_loss_is_reasonable(model, dummy_batch, loss_fn)`

For a classification head with C classes, the initial loss should be close to -log(1/C) (cross-entropy at uniform). Reject if the initial loss is more than 2× that value — usually means the final-layer bias was not initialized correctly, or a softmax was applied before the loss.

5. `assert_grad_flows_to_inputs(model, dummy_batch, loss_fn)`

Backward once on a loss-of-output, then verify every nn.Parameter that requires gradient has a non-zero gradient. Catches dead layers and detached subgraphs.

6. `assert_no_leak_in_data_pipeline(train_pipeline, val_pipeline)`

Probe: take 10 samples from train and val, confirm normalizer / tokenizer / feature-selector were fit on train only by checking that re-fitting on val would change them. See stage1-data-pipeline for the full procedure.

7. `assert_dropout_off_in_eval(model)`

Forward the same input through model.eval() twice and confirm bit-exact equality of outputs. Catches stochastic layers that were left active in eval mode.

8. `assert_optimizer_groups_cover_all_params(model, optimizer)`

Iterate over model.parameters() and confirm each one appears in some optimizer.param_groups[*].params. Catches the "I added a new module but forgot to put its params in the optimizer" bug.

Recommended invocation pattern

from curry_train.infra.preflight import run_preflight

errors = run_preflight(
    model=model,
    optimizer=optimizer,
    loss_fn=loss_fn,
    dummy_batch=dummy_batch,
    train_pipeline=train_pipeline,
    val_pipeline=val_pipeline,
    asserts="all",   # or a list of names
)
if errors:
    for e in errors:
        print(f"[{e.code}] {e.message} → fix: {e.remediation}")
    sys.exit(1)

run_preflight returns a list of PreflightErrors (empty on success) rather than raising on the first failure, so the user sees all problems at once.

Procedure when assisting a user

If the user has not yet wired up run_preflight(), copy the boilerplate above into their training entry point right before the optimizer-step loop.
If dummy_batch is missing, point them at stage1-scaffolder — every model package should expose a dummy_batch() factory.
After the first run, walk through any failed asserts in the order listed above (1 before 2 before 3, etc.). The order is intentional: later asserts assume earlier ones pass.
Treat preflight failures as blocking. Do not proceed to Stage 2 until all eight pass.

Boundaries

Preflight asserts catch correctness, not performance. A model that passes all eight can still be slow or unstable. Use Stage 4 for performance.
Preflight does not validate the meaningfulness of the architecture — only that the plumbing is correct. A randomly initialized model that produces gibberish but flows gradients will pass preflight. That is fine; it's not preflight's job.

Common failure modes the asserts catch

Forgot optimizer.zero_grad() between batches → assert 1.
Train/eval mode leaks (dropout active in eval) → asserts 2 and 7.
Softmax-before-cross-entropy → assert 4 (init loss too low).
Detached subgraph (e.g., tensor.detach() slipped in somewhere) → assert 5.
Normalizer fit on full data → assert 6.
New module added but not in optimizer → assert 8.

skills/stage1-scaffolder — what to scaffold so the asserts have something to check.
skills/stage2-overfit-single-batch — the next sanity step once preflight passes.
template/curry_train/infra/preflight.py — the implementations.

stage1-preflight-asserts

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage1-preflight-asserts

Tool Access

Preview

SKILL.md

Stage 1 · Skeleton · Preflight asserts

Stage question

The eight asserts

1. assert_zero_grad_idempotent(model, optimizer)

2. assert_train_eval_mode_clean(model)

3. assert_input_shape_contract(model, dummy_batch)

4. assert_init_loss_is_reasonable(model, dummy_batch, loss_fn)

5. assert_grad_flows_to_inputs(model, dummy_batch, loss_fn)

6. assert_no_leak_in_data_pipeline(train_pipeline, val_pipeline)

7. assert_dropout_off_in_eval(model)

8. assert_optimizer_groups_cover_all_params(model, optimizer)

Recommended invocation pattern

Procedure when assisting a user

Boundaries

Common failure modes the asserts catch

Related

Similar Skills

Help us improve

Stage 1 · Skeleton · Preflight asserts

Stage question

The eight asserts

1. assert_zero_grad_idempotent(model, optimizer)

2. assert_train_eval_mode_clean(model)

3. assert_input_shape_contract(model, dummy_batch)

4. assert_init_loss_is_reasonable(model, dummy_batch, loss_fn)

5. assert_grad_flows_to_inputs(model, dummy_batch, loss_fn)

6. assert_no_leak_in_data_pipeline(train_pipeline, val_pipeline)

7. assert_dropout_off_in_eval(model)

8. assert_optimizer_groups_cover_all_params(model, optimizer)

Recommended invocation pattern

Procedure when assisting a user

Boundaries

Common failure modes the asserts catch

Related

1. `assert_zero_grad_idempotent(model, optimizer)`

2. `assert_train_eval_mode_clean(model)`

3. `assert_input_shape_contract(model, dummy_batch)`

4. `assert_init_loss_is_reasonable(model, dummy_batch, loss_fn)`

5. `assert_grad_flows_to_inputs(model, dummy_batch, loss_fn)`

6. `assert_no_leak_in_data_pipeline(train_pipeline, val_pipeline)`

7. `assert_dropout_off_in_eval(model)`

8. `assert_optimizer_groups_cover_all_params(model, optimizer)`

1. `assert_zero_grad_idempotent(model, optimizer)`

2. `assert_train_eval_mode_clean(model)`

3. `assert_input_shape_contract(model, dummy_batch)`

4. `assert_init_loss_is_reasonable(model, dummy_batch, loss_fn)`

5. `assert_grad_flows_to_inputs(model, dummy_batch, loss_fn)`

6. `assert_no_leak_in_data_pipeline(train_pipeline, val_pipeline)`

7. `assert_dropout_off_in_eval(model)`

8. `assert_optimizer_groups_cover_all_params(model, optimizer)`