From curry-train
Set up an Optuna hyperparameter study integrated with Hydra and the project's Logger protocol, supporting TPE/CMA-ES/PBT samplers and Hyperband pruning. Activate when the user asks "set up Optuna", "hyperparameter sweep", "tune hyperparameters", "Bayesian optimization for training", or wants to refine LR/batch/dropout after capacity is chosen.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
A canonical recipe for plugging Optuna into a curryTrain project so that hyperparameter studies are reproducible, parallelizable, and journal-recorded.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
A canonical recipe for plugging Optuna into a curryTrain project so that hyperparameter studies are reproducible, parallelizable, and journal-recorded.
"Given the architecture and approximate hyperparameters from Stage 3, can I find the actual optimum within a defined budget?"
Use Optuna for:
stage3-lr-range-test.Do NOT use Optuna for:
stage3-lr-range-test first; it's cheaper.stage3-multi-seed-variance; Optuna assumes deterministic objectives.project/
├── configs/
│ ├── train.yaml # base training config
│ └── search/
│ └── lr_wd.yaml # search-space config (Hydra structured)
├── search/
│ ├── objective.py # the function Optuna calls
│ └── run_study.py # the launcher
└── runs/
└── studies/<study-name>/
# configs/search/lr_wd.yaml
defaults:
- override hydra/sweeper: optuna
- override hydra/sweeper/sampler: tpe
hydra:
sweeper:
direction: minimize
n_trials: 32
n_jobs: 4
study_name: ${experiment.name}
storage: sqlite:///runs/studies/${experiment.name}.db
params:
training.lr: tag(log, interval(1e-5, 1e-2))
training.weight_decay: tag(log, interval(1e-4, 1e-1))
training.warmup_steps: int(interval(0, 2000))
# search/objective.py
import optuna
from omegaconf import DictConfig
from curry_train.infra.journal import Run
def objective(cfg: DictConfig) -> float:
"""Run one trial; return the headline metric to minimize."""
with Run(cfg) as run: # journal entry per trial
result = train_and_evaluate(cfg) # uses Lightning Fabric
run.record_final(result.metrics)
return float(result.headline_metric) # e.g., last-10%-mean val loss
# Optuna hooks (pruning) — optional but recommended
def report_intermediate(trial: optuna.Trial, step: int, value: float):
trial.report(value, step)
if trial.should_prune():
raise optuna.TrialPruned()
| Sampler | When |
|---|---|
| TPE (default) | Most cases; robust, low setup. |
| CMA-ES | Many continuous params, smooth landscape. Use when n_trials > 50. |
| GridSampler | Small finite combinations; for sanity reproduction, not optimization. |
| PartialFixedSampler | Resume a study with new params fixed. |
| GPSampler (BoTorch) | Smooth landscape, ≤ 10 params; expensive evaluations justify the GP cost. |
Use Hyperband or MedianPruner for short trials with cheap intermediate reports:
hydra:
sweeper:
sampler:
_target_: optuna.samplers.TPESampler
pruner:
_target_: optuna.pruners.HyperbandPruner
min_resource: 200
max_resource: 5000
Pruning kills clearly-losing trials early. For the user, this typically reduces compute by 2–4×.
Confirm Stage 3 starting point exists. If not, point them to stage3-lr-range-test first.
Ask which 1–4 hyperparameters they actually want to search over. Do not let them search 10 parameters at once — too much variance, not enough trials.
Sketch the search space. Use log scale for LR and weight decay; linear for integer counts; categorical for choices.
Help them write objective.py so each trial is wrapped in the run-journal context. Failed trials must still produce a journal entry.
Recommend N trials based on dimensions: ~20 trials per parameter as a rule of thumb. 4 parameters → 64–80 trials.
After the study, walk through optuna.visualization.plot_param_importances(study) and plot_optimization_history(study). Surface anomalies (parameter pegged to bound, importance ≈ 0).
skills/stage3-lr-range-test — produces the LR starting point.skills/stage3-multi-seed-variance — wrap each trial in N seeds when variance is large.skills/infra-hydra-config — Hydra structure that hosts the search config.skills/infra-optuna-sweep — lower-level Optuna integration details.