From curry-train
Sweep model capacity (width, depth, parameter count) at fixed compute to find the saturation point — where adding more parameters stops reducing the train loss. Activate when the user asks "how big should my model be", "capacity sweep", "is my model big enough", "find the right model size", or after Stage 3 pre-validation passes.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
A short series of runs at increasing model sizes to identify where the train loss saturates — the size beyond which more parameters don't help. Confirms the architecture's capacity is well-matched to the data and budget.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
A short series of runs at increasing model sizes to identify where the train loss saturates — the size beyond which more parameters don't help. Confirms the architecture's capacity is well-matched to the data and budget.
"Past which model size does train loss stop decreasing meaningfully?"
The answer tells you the smallest model worth running at full scale, and serves as a sanity check that the architecture is not silently bottlenecked.
stage3-mup-coord-check).If train loss is still decreasing at the largest size: the architecture has more capacity to use; consider running larger.
If train loss plateaus before the target size: the architecture is bottlenecked; investigate before scaling further.
A train-loss plateau at moderate size has two possible causes:
Distinguish by checking the gap between train and val loss at the plateau:
Confirm Stage 3 has passed: pre-validation showed the variant works at small scale with statistical significance.
Pick the size grid using the user's compute budget. Don't sweep at sizes the user couldn't afford to run for real — small enough to be cheap, large enough to have signal.
Run with stage5-warmup-cosine schedule and stage3-kill-criterion enabled.
Plot the saturation curve. Render the verdict:
Recommend next step:
stage3-compute-budget).skills/stage3-scaling-fit — fits a curve to the sweep results.skills/stage4-optuna-integration — once size is chosen, refine other hyperparameters.skills/stage4-parallel-primitive-intro — at large enough sizes, parallelism primitives become necessary.skills/stage3-mup-coord-check — must hold for the sweep to be interpretable.