Skill

stage5-run-journal

Maintain a structured per-run journal capturing seed, config diff, git SHA, full training curves, kill events, rollbacks, and resumes — so that any run is fully reproducible and comparable later. Activate when the user asks "experiment tracking", "reproducibility", "run journal", "what should I record", or shows runs without traceable metadata.

npx claudepluginhub curryfromuestc/curry-train --plugin curry-train

Tool Access

This skill uses the workspace's default tool permissions.

Preview

A small, opinionated record per run that makes everything else (`runs-diff`, `diagnose`, scaling fit, ablation matrix) possible. Without a journal, post-hoc comparison is guesswork.

SKILL.md

Similar Skills

cache-components

139.4k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

pdf

131.6k

Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.

11 files

document-skills

Stats

Stars0

Forks0

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stage 5 · Stabilize · Run journal

A small, opinionated record per run that makes everything else (runs-diff, diagnose, scaling fit, ablation matrix) possible. Without a journal, post-hoc comparison is guesswork.

Stage question

"Can I, six weeks from now, fully reconstruct what this run was, why I ran it, and what happened?"

What goes in the journal

Per run, in runs/<run-id>/journal/:

File	Content
`config.yaml`	Hydra-resolved full config (every defaulted value materialized).
`config_diff.txt`	Diff vs the most recent baseline config the user identified.
`git_sha.txt`	Git SHA of the code at run start.
`git_dirty.diff`	Uncommitted diff at run start (if any).
`seed.txt`	Random seed(s) used.
`env.txt`	Python version, key package versions, CUDA version, GPU model and count.
`metrics.jsonl`	Per-step metrics: loss, grad_norm, lr, tokens, peak_mem, throughput.
`events.jsonl`	Discrete events: rollbacks, kills, resumes, checkpoints, dev evals.
`notes.md`	Free-form notes the user adds (one liners about the hypothesis, observations).
`final.json`	Summary at run end: terminal status, final loss, best metric, total tokens.

Recommended implementation

Lives at template/curry_train/infra/journal.py. Sketch:

class Run:
    """Context manager that creates and finalizes a run journal."""

    def __init__(self, cfg: DictConfig, run_dir: Path | None = None):
        self.cfg = cfg
        self.run_dir = run_dir or Path("runs") / cfg.experiment.name / now_id()
        self.metrics_writer = None
        self.events_writer = None
        self._final = {}

    def __enter__(self):
        self.run_dir.mkdir(parents=True, exist_ok=True)
        (self.run_dir / "config.yaml").write_text(OmegaConf.to_yaml(self.cfg))
        (self.run_dir / "git_sha.txt").write_text(_git_sha())
        (self.run_dir / "git_dirty.diff").write_text(_git_diff())
        (self.run_dir / "seed.txt").write_text(str(self.cfg.seed))
        (self.run_dir / "env.txt").write_text(_env_summary())
        self.metrics_writer = (self.run_dir / "metrics.jsonl").open("a")
        self.events_writer = (self.run_dir / "events.jsonl").open("a")
        return self

    def log_metric(self, **fields):
        self.metrics_writer.write(json.dumps({"ts": time.time(), **fields}) + "\n")

    def log_event(self, kind: str, **fields):
        self.events_writer.write(json.dumps({"ts": time.time(), "kind": kind, **fields}) + "\n")

    def record_final(self, summary: dict):
        self._final = summary

    def __exit__(self, exc_type, exc, tb):
        status = "ok" if exc is None else f"failed:{exc_type.__name__}"
        (self.run_dir / "final.json").write_text(json.dumps(
            {"status": status, **self._final}, indent=2))
        self.metrics_writer.close()
        self.events_writer.close()

Events to log

At minimum:

start — run beginning.
checkpoint — every save, with path and step.
dev_eval — every dev evaluation, with metrics.
rollback — stage5-loss-spike-rollback triggered.
kill — stage3-kill-criterion triggered.
resume — resumed from checkpoint.
end — run finished (success or failure).

Each event has a timestamp, kind, step (if applicable), and event-specific fields.

Why JSONL, not a logging backend

The journal is canonical. W&B, MLflow, TensorBoard are all great viewers, but they should sync from the journal, not be the source of truth. Reasons:

Survives backend outages or service shutdowns (Neptune.ai 2026 closure: relevant).
Diff-friendly — JSONL files compare cleanly.
Plain text, no schema lock-in.
Cheap to read in any post-hoc tool.

Wire backends as additional sinks, not as the primary store. See infra-tracking-backend.

Procedure when assisting a user

Wrap their training entry point in with Run(cfg) as run: so the journal is automatic. Show them the file structure that gets created.
Audit their existing logging — anything they currently send to W&B should also go to the journal. Don't strip W&B; just don't only use W&B.
Verify the journal is committed before declaring the run "done". Confirm final.json exists and contains a clear status.
For long runs, periodically tail events.jsonl — events are the high-signal record of what happened.
When the user later asks "did experiment X help", the journal is what powers the answer (the runs-diff skill auto-activates on that phrasing).

Boundaries

The journal is a per-run record. Cross-run analysis is done by tools that read multiple journals (e.g. runs-diff, stage6-ablation-matrix).
The journal does not cause reproducibility — it only records the inputs needed for it. The user must still seed everything (see stage3-multi-seed-variance).
Don't put very large blobs (full activations, attention maps) in the journal. Those go to a side directory and are referenced from journal events by path.

Common mistakes

Logging only to W&B → no canonical local record; if the service goes down, history is lost.
Recording only metrics, not events → can't reconstruct what happened during the run.
Forgetting to journal git_dirty.diff → can't reproduce uncommitted changes.
No env.txt → "works on my machine" surprises later.

skills/runs-diff — consumes the journal directly.
skills/stage3-kill-criterion — emits kill events.
skills/stage5-loss-spike-rollback — emits rollback events.
skills/infra-tracking-backend — additional sinks (W&B, MLflow, TensorBoard) layered on top.
template/curry_train/infra/journal.py.

stage5-run-journal

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

stage5-run-journal

Tool Access

Preview

SKILL.md

Stage 5 · Stabilize · Run journal

Stage question

What goes in the journal

Recommended implementation

Events to log

Why JSONL, not a logging backend

Procedure when assisting a user

Boundaries

Common mistakes

Related

Similar Skills

Help us improve

Stage 5 · Stabilize · Run journal

Stage question

What goes in the journal

Recommended implementation

Events to log

Why JSONL, not a logging backend

Procedure when assisting a user

Boundaries

Common mistakes

Related