From curry-train
Maintain a structured per-run journal capturing seed, config diff, git SHA, full training curves, kill events, rollbacks, and resumes — so that any run is fully reproducible and comparable later. Activate when the user asks "experiment tracking", "reproducibility", "run journal", "what should I record", or shows runs without traceable metadata.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
A small, opinionated record per run that makes everything else (`runs-diff`, `diagnose`, scaling fit, ablation matrix) possible. Without a journal, post-hoc comparison is guesswork.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
A small, opinionated record per run that makes everything else (runs-diff, diagnose, scaling fit, ablation matrix) possible. Without a journal, post-hoc comparison is guesswork.
"Can I, six weeks from now, fully reconstruct what this run was, why I ran it, and what happened?"
Per run, in runs/<run-id>/journal/:
| File | Content |
|---|---|
config.yaml | Hydra-resolved full config (every defaulted value materialized). |
config_diff.txt | Diff vs the most recent baseline config the user identified. |
git_sha.txt | Git SHA of the code at run start. |
git_dirty.diff | Uncommitted diff at run start (if any). |
seed.txt | Random seed(s) used. |
env.txt | Python version, key package versions, CUDA version, GPU model and count. |
metrics.jsonl | Per-step metrics: loss, grad_norm, lr, tokens, peak_mem, throughput. |
events.jsonl | Discrete events: rollbacks, kills, resumes, checkpoints, dev evals. |
notes.md | Free-form notes the user adds (one liners about the hypothesis, observations). |
final.json | Summary at run end: terminal status, final loss, best metric, total tokens. |
Lives at template/curry_train/infra/journal.py. Sketch:
class Run:
"""Context manager that creates and finalizes a run journal."""
def __init__(self, cfg: DictConfig, run_dir: Path | None = None):
self.cfg = cfg
self.run_dir = run_dir or Path("runs") / cfg.experiment.name / now_id()
self.metrics_writer = None
self.events_writer = None
self._final = {}
def __enter__(self):
self.run_dir.mkdir(parents=True, exist_ok=True)
(self.run_dir / "config.yaml").write_text(OmegaConf.to_yaml(self.cfg))
(self.run_dir / "git_sha.txt").write_text(_git_sha())
(self.run_dir / "git_dirty.diff").write_text(_git_diff())
(self.run_dir / "seed.txt").write_text(str(self.cfg.seed))
(self.run_dir / "env.txt").write_text(_env_summary())
self.metrics_writer = (self.run_dir / "metrics.jsonl").open("a")
self.events_writer = (self.run_dir / "events.jsonl").open("a")
return self
def log_metric(self, **fields):
self.metrics_writer.write(json.dumps({"ts": time.time(), **fields}) + "\n")
def log_event(self, kind: str, **fields):
self.events_writer.write(json.dumps({"ts": time.time(), "kind": kind, **fields}) + "\n")
def record_final(self, summary: dict):
self._final = summary
def __exit__(self, exc_type, exc, tb):
status = "ok" if exc is None else f"failed:{exc_type.__name__}"
(self.run_dir / "final.json").write_text(json.dumps(
{"status": status, **self._final}, indent=2))
self.metrics_writer.close()
self.events_writer.close()
At minimum:
start — run beginning.checkpoint — every save, with path and step.dev_eval — every dev evaluation, with metrics.rollback — stage5-loss-spike-rollback triggered.kill — stage3-kill-criterion triggered.resume — resumed from checkpoint.end — run finished (success or failure).Each event has a timestamp, kind, step (if applicable), and event-specific fields.
The journal is canonical. W&B, MLflow, TensorBoard are all great viewers, but they should sync from the journal, not be the source of truth. Reasons:
Wire backends as additional sinks, not as the primary store. See infra-tracking-backend.
Wrap their training entry point in with Run(cfg) as run: so the journal is automatic. Show them the file structure that gets created.
Audit their existing logging — anything they currently send to W&B should also go to the journal. Don't strip W&B; just don't only use W&B.
Verify the journal is committed before declaring the run "done". Confirm final.json exists and contains a clear status.
For long runs, periodically tail events.jsonl — events are the high-signal record of what happened.
When the user later asks "did experiment X help", the journal is what powers the answer (the runs-diff skill auto-activates on that phrasing).
runs-diff, stage6-ablation-matrix).stage3-multi-seed-variance).git_dirty.diff → can't reproduce uncommitted changes.env.txt → "works on my machine" surprises later.skills/runs-diff — consumes the journal directly.skills/stage3-kill-criterion — emits kill events.skills/stage5-loss-spike-rollback — emits rollback events.skills/infra-tracking-backend — additional sinks (W&B, MLflow, TensorBoard) layered on top.template/curry_train/infra/journal.py.