Skill

stage6-ablation-matrix

Run a structured grid of ablation experiments (multiple changes vs baseline, possibly combinations), report the matrix with variance-aware verdicts, isolate which changes actually contributed. Activate when the user asks "ablation study", "which changes matter", "ablation table", "isolate contribution of X", or after multiple variants have been evaluated.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A grid of experiments designed to isolate the contribution of each change in a multi-change variant. Without this, "we improved by 5%" hides the fact that 4 of 5 changes did nothing.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 6 · Iterate · Ablation matrix

A grid of experiments designed to isolate the contribution of each change in a multi-change variant. Without this, "we improved by 5%" hides the fact that 4 of 5 changes did nothing.

Stage question

"Of the K changes in my new model, which ones actually contribute to the improvement, and which are vestigial?"

When to run an ablation matrix

Run an ablation when:

A variant ships with multiple changes vs the baseline.
The next decision is whether to keep all the changes or simplify.
Future maintenance cost depends on knowing which parts are essential.

If the variant only has one change vs baseline, you don't need a matrix; one A/B is enough.

The two designs

Leave-one-out ablation

For changes c_1, ..., c_K:

Variant V (baseline + all K changes).
For each i: run V − c_i (variant with c_i removed).
Compute the loss/metric drop when each c_i is removed.

This isolates the contribution of each change to the final model.

Sequential addition ablation

Baseline B.
B + c_1.
B + c_1 + c_2.
...
B + c_1 + ... + c_K = V.

This isolates the value of adding each change incrementally (depends on order).

The two designs answer slightly different questions; leave-one-out is more common for "which can I remove", sequential is more useful for "in what order should features be developed".

Variance handling

Each ablation cell needs multi-seed averaging. Without it, the matrix is noise. The minimum is N = 3 seeds per cell. For K = 5 changes leave-one-out, that's 5 × 3 = 15 runs at full scale plus (K+1) × 3 = 18 for baseline and full variant. Often this is the largest compute commitment in the project.

If full-scale ablation is too expensive, consider doing it at a smaller proxy size with stage3-mup-coord-check validity, then verifying the largest two contributors at full scale.

Recommended output

A markdown table:

| variant       | mean   | std    | Δ vs B  | sig (vs V) |
|---------------|--------|--------|---------|------------|
| baseline (B)  | 2.512  | 0.008  |  —      | sig worse  |
| V − c1        | 2.398  | 0.011  | -0.114  | sig worse  |
| V − c2        | 2.376  | 0.009  | -0.136  | indist.    |
| V − c3        | 2.378  | 0.010  | -0.134  | indist.    |
| V − c4        | 2.401  | 0.012  | -0.111  | sig worse  |
| V − c5        | 2.379  | 0.008  | -0.133  | indist.    |
| variant (V)   | 2.378  | 0.009  | -0.134  |  —         |

Reading: c2, c3, c5 are not contributing (V − c_i is indistinguishable from V). c1, c4 are contributing (V − c_i is worse than V).

Decision

After the matrix:

Drop changes whose removal is statistically indistinguishable from V. They are not earning their complexity.
Keep changes whose removal causes a real regression. They are paying for themselves.
Re-evaluate the resulting "minimal V" — it may differ from the original V both in performance and in maintenance cost.

Procedure when assisting a user

Confirm the variant has K ≥ 2 changes. If K = 1, ablation matrix is overkill; just compare V vs B with stage6-variance-aware-decision.
Decide on leave-one-out vs sequential. Default is leave-one-out unless the user has a specific reason for sequential.
Estimate compute. If too expensive at full scale, drop to a proxy size with muP. Be explicit about the size used.
Wire up N ≥ 3 seeds per cell. Anything less is noise.
Render the matrix. For each cell, render: mean, std, Δ vs baseline, significance vs V (using stage6-variance-aware-decision logic).
Render the "minimal V" — the variant with all non-contributing changes removed. Estimate its compute cost and complexity reduction. Recommend running this minimal variant as the actual final model.

Boundaries

Ablations test whether changes are individually necessary at the chosen scale. Synergies (X helps only when Y is present) require interaction-design ablations (e.g. 2-way grid), which are exponentially more expensive.
Compute scales as O(K × N), so with K ≥ 6 the matrix becomes very expensive. Limit K by clustering related changes.
A change that "doesn't contribute" at the current size may matter at larger size. Note the size in the matrix and re-test if scaling up.

Common mistakes

One seed per cell → matrix is mostly noise.
Comparing each cell to baseline only → misses that V vs (V − c_i) is the right comparison.
Not reporting std → readers can't assess significance.
Removing a change because it didn't help, then forgetting to verify the minimal V actually trains stably (it may rely on some of the removed changes for stability).

skills/stage6-variance-aware-decision — significance testing for each cell.
skills/stage6-error-cluster — directs which changes to ablate (the ones aimed at specific clusters).
skills/stage3-small-scale-ablation — small-scale precursor; full-scale matrix runs after.
Sutskever et al., "Distributed Representations of Words" appendix is an early example of clean ablation tables.

stage6-ablation-matrix

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage6-ablation-matrix

Tool Access

Preview

SKILL.md

Stage 6 · Iterate · Ablation matrix

Stage question

When to run an ablation matrix

The two designs

Leave-one-out ablation

Sequential addition ablation

Variance handling

Recommended output

Decision

Procedure when assisting a user

Boundaries

Common mistakes

Related

Similar Skills

Help us improve

Stage 6 · Iterate · Ablation matrix

Stage question

When to run an ablation matrix

The two designs

Leave-one-out ablation

Sequential addition ablation

Variance handling

Recommended output

Decision

Procedure when assisting a user

Boundaries

Common mistakes

Related