Skill

stage3-kill-criterion

Define an abort condition before launching a run, so that a clearly broken or clearly under-performing run stops itself instead of consuming the full compute budget. Activate when the user asks "when should I kill a run", "abort condition", "early stop a bad run", "kill criterion", or before any expensive run.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Define **before launching** the rule that determines when a run is so unlikely to succeed that you should abort it. Stops good compute from being wasted on dead runs.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 3 · Pre-validate · Kill criterion

Define before launching the rule that determines when a run is so unlikely to succeed that you should abort it. Stops good compute from being wasted on dead runs.

Stage question

"Under what specific, measurable condition will I kill this run before its scheduled completion?"

If you cannot answer in one sentence with concrete numbers, the kill criterion is missing.

Why predefine, not improvise

Without a predefined criterion, the user invariably:

Lets a clearly broken run finish "to be sure".
Manually intervenes after a vague "this looks bad" feeling — at different points across runs, making comparison invalid.
Or never kills anything, wasting compute on losses that will never recover.

Predefining the criterion gives the run an unambiguous green/red signal at any point in time.

What a good kill criterion contains

Each criterion has three parts: a metric, a threshold, and a time window.

Examples

NaN guard: if any of loss, grad_norm, parameter norm becomes NaN or Inf → kill immediately.
Loss spike: if loss exceeds 5 × the rolling minimum loss for more than 100 consecutive steps → kill.
Stagnation: if the rolling-100-step mean loss has not improved by more than 0.1% in the last 2000 steps (and we are past warmup) → kill.
Variance comparison: if a parallel-running baseline arm has a rolling mean 2σ better than this arm → kill the variant; the comparison is decided.
Hardware: if GPU utilization drops below 30% for 5 minutes → kill (something is wrong with data loading or distributed setup).
Time budget: if wall time exceeds 1.3 × estimated, send an alert; at 1.5 ×, kill.

A good run has several kill criteria registered, each catching a different failure mode.

Recommended implementation

A simple watchdog that runs in the same process or as a sidecar:

class KillCriteria:
    def __init__(self, criteria: list[Callable[[RunState], Optional[str]]]):
        self.criteria = criteria

    def check(self, state: RunState) -> Optional[str]:
        for c in self.criteria:
            reason = c(state)
            if reason:
                return reason
        return None

# Example criterion
def loss_spike(state: RunState) -> Optional[str]:
    if state.step < 1000: return None
    rolling_min = min(state.recent_losses[-1000:])
    if state.recent_losses[-100:] and \
       all(l > 5 * rolling_min for l in state.recent_losses[-100:]):
        return "loss_spike: 100 consecutive steps > 5x rolling min"
    return None

Lives at template/curry_train/infra/journal.py:KillCriteria (alongside the run journal).

The training loop calls criteria.check(state) once per step and exits cleanly with a journal entry recording the reason.

Procedure when assisting a user

Before they launch, write the kill criteria into the config. Make them version-controlled, not ad-hoc.
For ablation runs vs a baseline, add a comparative criterion: kill the variant if a sibling baseline is winning by 2σ. This stops compute from being spent on a comparison whose answer is already in.
Make sure the kill produces a journal entry with the reason, the step, and the state. Killed runs are valuable data; they should not silently disappear.
After killing, the user should:
- Read the killed run's journal entry to confirm the kill was correct (no false positives).
- Invoke the diagnose skill on the killed run to check what went wrong (if it was a real failure, not a comparison loss).
If the user resists predefining a kill criterion ("but what if it recovers?"), point out that runs that need to recover from a 5× loss spike for 100 steps approximately never produce competitive final losses. This is empirically robust.

Boundaries

A kill criterion is a hard rule on objective metrics. It is not a substitute for human judgment on architecture choices.
"Recovery from loss spike" recipes (see stage5-loss-spike-rollback) are different from kill criteria — those try to save a run that briefly spiked, not classify it as dead. Use both: rollback first, then kill if rollback doesn't help within a window.
Kill criteria can be overly conservative; review the rate of killed runs occasionally. If you're killing more than ~10% of runs, the criteria may be too tight.

Common mistakes

No kill criterion at all → wasted compute on dead runs.
Kill on instantaneous spike (without window) → false positives on transient noise.
Kill on absolute threshold that doesn't track baseline → criterion ages with the project.
Killing without journaling → loses the diagnostic data.

skills/stage5-loss-spike-rollback — recover-then-kill, instead of kill-immediately.
skills/stage3-compute-budget — defines the wall-time budget that one of the kill criteria enforces.
skills/stage3-multi-seed-variance — provides the σ value used in the comparison criterion.
skills/diagnose — run after a kill to understand the cause.
template/curry_train/infra/journal.py.

stage3-kill-criterion

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage3-kill-criterion

Tool Access

Preview

SKILL.md

Stage 3 · Pre-validate · Kill criterion

Stage question

Why predefine, not improvise

What a good kill criterion contains

Examples

Recommended implementation

Procedure when assisting a user

Boundaries

Common mistakes

Related

Similar Skills

Help us improve

Stage 3 · Pre-validate · Kill criterion

Stage question

Why predefine, not improvise

What a good kill criterion contains

Examples

Recommended implementation

Procedure when assisting a user

Boundaries

Common mistakes

Related