From curry-train
Run a structured grid of ablation experiments (multiple changes vs baseline, possibly combinations), report the matrix with variance-aware verdicts, isolate which changes actually contributed. Activate when the user asks "ablation study", "which changes matter", "ablation table", "isolate contribution of X", or after multiple variants have been evaluated.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
A grid of experiments designed to isolate the contribution of each change in a multi-change variant. Without this, "we improved by 5%" hides the fact that 4 of 5 changes did nothing.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
A grid of experiments designed to isolate the contribution of each change in a multi-change variant. Without this, "we improved by 5%" hides the fact that 4 of 5 changes did nothing.
"Of the K changes in my new model, which ones actually contribute to the improvement, and which are vestigial?"
Run an ablation when:
If the variant only has one change vs baseline, you don't need a matrix; one A/B is enough.
For changes c_1, ..., c_K:
i: run V − c_i (variant with c_i removed).This isolates the contribution of each change to the final model.
This isolates the value of adding each change incrementally (depends on order).
The two designs answer slightly different questions; leave-one-out is more common for "which can I remove", sequential is more useful for "in what order should features be developed".
Each ablation cell needs multi-seed averaging. Without it, the matrix is noise. The minimum is N = 3 seeds per cell. For K = 5 changes leave-one-out, that's 5 × 3 = 15 runs at full scale plus (K+1) × 3 = 18 for baseline and full variant. Often this is the largest compute commitment in the project.
If full-scale ablation is too expensive, consider doing it at a smaller proxy size with stage3-mup-coord-check validity, then verifying the largest two contributors at full scale.
A markdown table:
| variant | mean | std | Δ vs B | sig (vs V) |
|---------------|--------|--------|---------|------------|
| baseline (B) | 2.512 | 0.008 | — | sig worse |
| V − c1 | 2.398 | 0.011 | -0.114 | sig worse |
| V − c2 | 2.376 | 0.009 | -0.136 | indist. |
| V − c3 | 2.378 | 0.010 | -0.134 | indist. |
| V − c4 | 2.401 | 0.012 | -0.111 | sig worse |
| V − c5 | 2.379 | 0.008 | -0.133 | indist. |
| variant (V) | 2.378 | 0.009 | -0.134 | — |
Reading: c2, c3, c5 are not contributing (V − c_i is indistinguishable from V). c1, c4 are contributing (V − c_i is worse than V).
After the matrix:
Confirm the variant has K ≥ 2 changes. If K = 1, ablation matrix is overkill; just compare V vs B with stage6-variance-aware-decision.
Decide on leave-one-out vs sequential. Default is leave-one-out unless the user has a specific reason for sequential.
Estimate compute. If too expensive at full scale, drop to a proxy size with muP. Be explicit about the size used.
Wire up N ≥ 3 seeds per cell. Anything less is noise.
Render the matrix. For each cell, render: mean, std, Δ vs baseline, significance vs V (using stage6-variance-aware-decision logic).
Render the "minimal V" — the variant with all non-contributing changes removed. Estimate its compute cost and complexity reduction. Recommend running this minimal variant as the actual final model.
skills/stage6-variance-aware-decision — significance testing for each cell.skills/stage6-error-cluster — directs which changes to ablate (the ones aimed at specific clusters).skills/stage3-small-scale-ablation — small-scale precursor; full-scale matrix runs after.