Skill

stage4-optuna-integration

Set up an Optuna hyperparameter study integrated with Hydra and the project's Logger protocol, supporting TPE/CMA-ES/PBT samplers and Hyperband pruning. Activate when the user asks "set up Optuna", "hyperparameter sweep", "tune hyperparameters", "Bayesian optimization for training", or wants to refine LR/batch/dropout after capacity is chosen.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A canonical recipe for plugging Optuna into a curryTrain project so that hyperparameter studies are reproducible, parallelizable, and journal-recorded.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 4 · Scale-up · Optuna integration

A canonical recipe for plugging Optuna into a curryTrain project so that hyperparameter studies are reproducible, parallelizable, and journal-recorded.

Stage question

"Given the architecture and approximate hyperparameters from Stage 3, can I find the actual optimum within a defined budget?"

When to use Optuna vs. when not to

Use Optuna for:

Refining LR, weight decay, dropout, warmup steps around a starting point you found via stage3-lr-range-test.
Searching over an MoE expert count or attention head count.
Multi-objective trade-offs (loss vs. throughput).

Do NOT use Optuna for:

Initial LR search → use stage3-lr-range-test first; it's cheaper.
Variance estimation → use stage3-multi-seed-variance; Optuna assumes deterministic objectives.
Architecture search at large scale → expensive; use scaling-law extrapolation first.

Recommended layout

project/
├── configs/
│   ├── train.yaml           # base training config
│   └── search/
│       └── lr_wd.yaml       # search-space config (Hydra structured)
├── search/
│   ├── objective.py         # the function Optuna calls
│   └── run_study.py         # the launcher
└── runs/
    └── studies/<study-name>/

Search-space config (Hydra)

# configs/search/lr_wd.yaml
defaults:
  - override hydra/sweeper: optuna
  - override hydra/sweeper/sampler: tpe

hydra:
  sweeper:
    direction: minimize
    n_trials: 32
    n_jobs: 4
    study_name: ${experiment.name}
    storage: sqlite:///runs/studies/${experiment.name}.db
    params:
      training.lr: tag(log, interval(1e-5, 1e-2))
      training.weight_decay: tag(log, interval(1e-4, 1e-1))
      training.warmup_steps: int(interval(0, 2000))

Objective function pattern

# search/objective.py
import optuna
from omegaconf import DictConfig
from curry_train.infra.journal import Run

def objective(cfg: DictConfig) -> float:
    """Run one trial; return the headline metric to minimize."""
    with Run(cfg) as run:                  # journal entry per trial
        result = train_and_evaluate(cfg)   # uses Lightning Fabric
        run.record_final(result.metrics)
        return float(result.headline_metric)  # e.g., last-10%-mean val loss

# Optuna hooks (pruning) — optional but recommended
def report_intermediate(trial: optuna.Trial, step: int, value: float):
    trial.report(value, step)
    if trial.should_prune():
        raise optuna.TrialPruned()

Sampler choice

Sampler	When
TPE (default)	Most cases; robust, low setup.
CMA-ES	Many continuous params, smooth landscape. Use when n_trials > 50.
GridSampler	Small finite combinations; for sanity reproduction, not optimization.
PartialFixedSampler	Resume a study with new params fixed.
GPSampler (BoTorch)	Smooth landscape, ≤ 10 params; expensive evaluations justify the GP cost.

Pruning

Use Hyperband or MedianPruner for short trials with cheap intermediate reports:

hydra:
  sweeper:
    sampler:
      _target_: optuna.samplers.TPESampler
    pruner:
      _target_: optuna.pruners.HyperbandPruner
      min_resource: 200
      max_resource: 5000

Pruning kills clearly-losing trials early. For the user, this typically reduces compute by 2–4×.

Procedure when assisting a user

Confirm Stage 3 starting point exists. If not, point them to stage3-lr-range-test first.
Ask which 1–4 hyperparameters they actually want to search over. Do not let them search 10 parameters at once — too much variance, not enough trials.
Sketch the search space. Use log scale for LR and weight decay; linear for integer counts; categorical for choices.
Help them write objective.py so each trial is wrapped in the run-journal context. Failed trials must still produce a journal entry.
Recommend N trials based on dimensions: ~20 trials per parameter as a rule of thumb. 4 parameters → 64–80 trials.
After the study, walk through optuna.visualization.plot_param_importances(study) and plot_optimization_history(study). Surface anomalies (parameter pegged to bound, importance ≈ 0).

Boundaries

Optuna assumes the objective is approximately deterministic. Wrap each trial with 2–3 seeds if variance is significant — otherwise the optimizer chases noise.
For distributed search, use Optuna's RDB storage (sqlite for small studies, PostgreSQL for large). All workers share the study and the database is the source of truth.
Don't use Optuna to optimize the training loss directly without a stopping criterion — it will blow the compute budget on slowly-converging trials.

Common mistakes

Searching LR at linear scale → never finds the right value. Always log scale.
Including too many hyperparameters → a 12-parameter search needs hundreds of trials.
Ignoring pruning → 4× compute waste.
Optimizing on train loss instead of val loss → finds overfit models.

skills/stage3-lr-range-test — produces the LR starting point.
skills/stage3-multi-seed-variance — wrap each trial in N seeds when variance is large.
skills/infra-hydra-config — Hydra structure that hosts the search config.
skills/infra-optuna-sweep — lower-level Optuna integration details.
Optuna docs: optuna.org.

stage4-optuna-integration

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage4-optuna-integration

Tool Access

Preview

SKILL.md

Stage 4 · Scale-up · Optuna integration

Stage question

When to use Optuna vs. when not to

Recommended layout

Search-space config (Hydra)

Objective function pattern

Sampler choice

Pruning

Procedure when assisting a user

Boundaries

Common mistakes

Related

Similar Skills

Help us improve

Stage 4 · Scale-up · Optuna integration

Stage question

When to use Optuna vs. when not to

Recommended layout

Search-space config (Hydra)

Objective function pattern

Sampler choice

Pruning

Procedure when assisting a user

Boundaries

Common mistakes

Related