From curry-train
Define an abort condition before launching a run, so that a clearly broken or clearly under-performing run stops itself instead of consuming the full compute budget. Activate when the user asks "when should I kill a run", "abort condition", "early stop a bad run", "kill criterion", or before any expensive run.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
Define **before launching** the rule that determines when a run is so unlikely to succeed that you should abort it. Stops good compute from being wasted on dead runs.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Define before launching the rule that determines when a run is so unlikely to succeed that you should abort it. Stops good compute from being wasted on dead runs.
"Under what specific, measurable condition will I kill this run before its scheduled completion?"
If you cannot answer in one sentence with concrete numbers, the kill criterion is missing.
Without a predefined criterion, the user invariably:
Predefining the criterion gives the run an unambiguous green/red signal at any point in time.
Each criterion has three parts: a metric, a threshold, and a time window.
loss, grad_norm, parameter norm becomes NaN or Inf → kill immediately.A good run has several kill criteria registered, each catching a different failure mode.
A simple watchdog that runs in the same process or as a sidecar:
class KillCriteria:
def __init__(self, criteria: list[Callable[[RunState], Optional[str]]]):
self.criteria = criteria
def check(self, state: RunState) -> Optional[str]:
for c in self.criteria:
reason = c(state)
if reason:
return reason
return None
# Example criterion
def loss_spike(state: RunState) -> Optional[str]:
if state.step < 1000: return None
rolling_min = min(state.recent_losses[-1000:])
if state.recent_losses[-100:] and \
all(l > 5 * rolling_min for l in state.recent_losses[-100:]):
return "loss_spike: 100 consecutive steps > 5x rolling min"
return None
Lives at template/curry_train/infra/journal.py:KillCriteria (alongside the run journal).
The training loop calls criteria.check(state) once per step and exits cleanly with a journal entry recording the reason.
Before they launch, write the kill criteria into the config. Make them version-controlled, not ad-hoc.
For ablation runs vs a baseline, add a comparative criterion: kill the variant if a sibling baseline is winning by 2σ. This stops compute from being spent on a comparison whose answer is already in.
Make sure the kill produces a journal entry with the reason, the step, and the state. Killed runs are valuable data; they should not silently disappear.
After killing, the user should:
diagnose skill on the killed run to check what went wrong (if it was a real failure, not a comparison loss).If the user resists predefining a kill criterion ("but what if it recovers?"), point out that runs that need to recover from a 5× loss spike for 100 steps approximately never produce competitive final losses. This is empirically robust.
stage5-loss-spike-rollback) are different from kill criteria — those try to save a run that briefly spiked, not classify it as dead. Use both: rollback first, then kill if rollback doesn't help within a window.skills/stage5-loss-spike-rollback — recover-then-kill, instead of kill-immediately.skills/stage3-compute-budget — defines the wall-time budget that one of the kill criteria enforces.skills/stage3-multi-seed-variance — provides the σ value used in the comparison criterion.skills/diagnose — run after a kill to understand the cause.template/curry_train/infra/journal.py.