From curry-train
A Logger protocol decoupling the training code from any specific tracking backend (W&B, MLflow, Aim, TensorBoard) — with TensorBoard as the zero-dependency default. Activate when the user asks "experiment tracking", "W&B integration", "TensorBoard setup", "MLflow", "switch tracking backend", or wants tracking without lock-in.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
curryTrain does **not** lock into any one tracking backend. The reasons (subagent B's research):
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
curryTrain does not lock into any one tracking backend. The reasons (subagent B's research):
stage5-run-journal) — backends are display layers, not source of truth.from typing import Protocol
class TrackingBackend(Protocol):
"""A minimal interface for any experiment-tracking sink."""
def log_metrics(self, metrics: dict[str, float], step: int) -> None: ...
def log_artifact(self, name: str, path: str | Path) -> None: ...
def log_config(self, cfg: dict) -> None: ...
def finish(self) -> None: ...
Every backend implements this Protocol. Training code calls only Protocol methods. New backends are drop-in.
class TensorBoardBackend:
def __init__(self, log_dir: Path):
from torch.utils.tensorboard import SummaryWriter
self.writer = SummaryWriter(log_dir=str(log_dir))
def log_metrics(self, metrics, step):
for k, v in metrics.items():
self.writer.add_scalar(k, v, step)
def log_artifact(self, name, path):
# TB doesn't really do artifacts; record path in scalar text instead
self.writer.add_text(f"artifact/{name}", str(path))
def log_config(self, cfg):
self.writer.add_text("config", json.dumps(cfg, indent=2))
def finish(self):
self.writer.close()
class WandbBackend:
def __init__(self, project: str, name: str, config: dict):
import wandb
self.run = wandb.init(project=project, name=name, config=config)
def log_metrics(self, metrics, step): self.run.log(metrics, step=step)
def log_artifact(self, name, path):
import wandb
art = wandb.Artifact(name=name, type="checkpoint")
art.add_file(str(path))
self.run.log_artifact(art)
def log_config(self, cfg): self.run.config.update(cfg)
def finish(self): self.run.finish()
class CompositeBackend:
def __init__(self, backends: list[TrackingBackend]):
self.backends = backends
def log_metrics(self, metrics, step):
for b in self.backends: b.log_metrics(metrics, step)
# ... and so on
This is what enables "TB locally + W&B for sharing" without changing training code.
# configs/logging/tb_only.yaml
backend:
_target_: curry_train.infra.tracking.TensorBoardBackend
log_dir: ${paths.runs}/${experiment.name}/${now:%Y-%m-%d_%H-%M-%S}/tb
# configs/logging/tb_plus_wandb.yaml
backend:
_target_: curry_train.infra.tracking.CompositeBackend
backends:
- _target_: curry_train.infra.tracking.TensorBoardBackend
log_dir: ${paths.runs}/${experiment.name}/${now:%Y-%m-%d_%H-%M-%S}/tb
- _target_: curry_train.infra.tracking.WandbBackend
project: curry-train
name: ${experiment.name}
Switch by python train.py logging=tb_plus_wandb.
# After fabric and Run setup:
backend = hydra.utils.instantiate(cfg.logging.backend)
backend.log_config(OmegaConf.to_container(cfg))
with Run(cfg) as run:
for step, batch in enumerate(loader):
...
loss_value = loss.item()
run.log_metric(step=step, loss=loss_value)
backend.log_metrics({"train/loss": loss_value}, step=step)
backend.finish()
The journal always gets the data (canonical record); the backend is a viewer.
Default to TensorBoard. It has zero installation friction and survives any service shutdown.
If the user wants W&B for sharing, suggest the composite — TB and W&B, not W&B alone.
Don't tutorial users on W&B account setup; that's outside curryTrain's scope. Do tell them to set WANDB_API_KEY as env var.
Confirm the journal is writing parallel data — if W&B goes down mid-run, the journal still has everything.
For runs-diff, point infra-tracking-backend users at the journal, not at W&B's UI — runs-diff reads the journal, not any backend.
log_metrics call is synchronous. Per-step overhead is small (< 1 ms for TB).skills/stage5-run-journal — the canonical local record.skills/infra-hydra-config — the logging config group.skills/runs-diff — reads the journal, not the backend.