Skill

stage1-scaffolder

Methodology for Stage 1 Skeleton — set up the minimum architecture (model, dataset adapter, config, registration) so the data-flow can be traced end-to-end before any optimization. Activate when the user asks "how do I add a new model", "what files does a curryTrain model need", "set up the architecture skeleton", "where does my model.py go", or "how does registration work".

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Before any tuning, before any sanity check, the architecture must exist as **four small files** with a clean, layered boundary. This skill describes the contract.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 1 · Skeleton · Scaffolder methodology

Before any tuning, before any sanity check, the architecture must exist as four small files with a clean, layered boundary. This skill describes the contract.

Stage question

"Does the architecture exist and does data flow through it from input to loss?"

If the answer is "I'm not sure", you're still in Stage 1.

The four-file contract

Every model in curryTrain lives at curry_train/models/<name>/ and has exactly these files:

File	Job	Typical size
`config.py`	Architecture parameters as a frozen dataclass. HuggingFace-style.	~50–90 lines
`model.py`	Layers and the model class, built from `curry_train.primitives.*`.	~150–260 lines
`checkpoint.py`	HF weight ↔ internal weight conversion (uses `primitive-hf-bridge`).	~120–180 lines
`protocol.py`	`register_model(...)` call with a build function.	~30–50 lines

If a file grows past these sizes, you are likely mixing layers and should split.

Layer boundaries (do not violate)

Runtime ↔ Primitive: the runtime sees only the TrainingRuntime protocol from curry_train.runtime. It never imports model code.
Primitive ↔ Model: model code calls from curry_train.primitives import .... It never imports a specific runtime backend.
Model ↔ User project: the user's configs/<name>.yaml and register_model(...) are the only public seams.

If an arrow goes the wrong way, the architecture is wrong. Surface this to the user immediately.

Procedure when scaffolding a new model

Follow this exact order; every step is verifiable.

Decide the tensor shape contract before writing any code.
- Standard transformer: (B, N, D).
- SNN model: (B, T, N, D) where T is the spike-time dimension.
- CNN / vision: (B, C, H, W) until a flattening point. Document the shape contract at the top of model.py as a comment.
Write config.py first. Frozen dataclass with __post_init__ validation. No defaults that hide bugs (e.g. don't default n_layers=12 silently — require it).
Write model.py second. Use only curry_train.primitives.* for the building blocks. If a primitive is missing, write a stub in curry_train.primitives and a one-line note in the relevant primitive-* skill — do not inline the missing logic into model.py.
Write checkpoint.py only when an HF source exists. If the user is starting from scratch (no pre-trained weights), this file may be empty initially. Mark it with TODO: HF weight conversion not yet needed.
Write protocol.py last. Call register_model(ModelSpec(name=..., package=..., impls={...})). The impls dict points to functions that build a runtime; for V1 most users will register a single impl backed by runtimes/local_torch.LocalTorchRuntime.
Run preflight asserts (stage1-preflight-asserts) and bench (1 step) before declaring Stage 1 complete.

Hard rules to enforce on the user

No custom training loop in model.py. Loops live in curry_train.loop.
No import torch.distributed in model.py. Distributed concerns live in primitives like parallel-state and distributed-optimizer.
No silent shape coercions in model.py. If a tensor enters with the wrong shape, raise immediately.
No magic numbers in model.py that aren't sourced from config.py.

Stage exit criteria

Stage 1 is done when:

bench --steps=1 produces a finite loss and finite grad-norm.
stage1-preflight-asserts passes all checks.
The user can explain in one sentence what shape goes in and what shape comes out at each layer.

If any of these fail, do not let the user advance to Stage 2.

Common failure modes to flag

Mixing primitives across layer boundaries (e.g. doing an all_reduce inside model.py).
Using nn.LayerNorm directly when an RMSNorm primitive exists (or vice versa) — every choice should go through a primitive.
Hand-rolling attention instead of using primitive-gqattention.
Forgetting to register the model — create_runtime(<name>) raises KeyError before the user notices.

skills/stage1-preflight-asserts — the checks to run after scaffolding.
skills/stage1-data-pipeline — leakage-safe split + transform.
agents/scaffolder.md — the agent that actually writes the four files for you.
template/curry_train/runtime.py — the protocol you must satisfy.

stage1-scaffolder

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage1-scaffolder

Tool Access

Preview

SKILL.md

Stage 1 · Skeleton · Scaffolder methodology

Stage question

The four-file contract

Layer boundaries (do not violate)

Procedure when scaffolding a new model

Hard rules to enforce on the user

Stage exit criteria

Common failure modes to flag

Related

Similar Skills

Help us improve

Stage 1 · Skeleton · Scaffolder methodology

Stage question

The four-file contract

Layer boundaries (do not violate)

Procedure when scaffolding a new model

Hard rules to enforce on the user

Stage exit criteria

Common failure modes to flag

Related