From evsys-sdk
Scaffolds or migrates a repo to the evsys-sdk research-project layout (data/, src/, experiments/, .evsys/). Use when starting a new project or bringing an ad-hoc project into the standard shape.
How this skill is triggered — by the user, by Claude, or both
Slash command
/evsys-sdk:set-up-research-projectThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill stands up a new repo against the evsys-sdk research-project
This skill stands up a new repo against the evsys-sdk research-project layout, or migrates an existing one into it. Use it when:
evsys-sdk.training/ scripts,
scattered data/*.json(l) files, custom helper packages, ad-hoc
output/ or checkpoints/ dirs, no experiments/ or src/
structure) and wants to bring it into the standard shape.The target layout is the one documented in
evsys-sdk/docs/DESIGN.md → "Researcher-project layout":
<project>/
├── pyproject.toml # declares src/ as importable
├── README.md
├── data/
│ ├── raw/ # untouched source dumps (gitignored)
│ ├── fetch/ # python scripts that populate raw/
│ ├── process/ # raw → datasets/<name>/v<N>/
│ ├── datasets/<name>/v1/{train,test}.jsonl + metadata.yaml
│ └── benchmark/<name>/tasks.jsonl + metadata.yaml [+ images/ + raw/]
├── src/ # project-specific SDK extensions
│ ├── __init__.py # imports verifiers/metrics/transforms
│ ├── verifiers.py
│ ├── metrics.py
│ └── transforms.py
├── experiments/<yyyymmdd>_<slug>/
│ ├── config.yaml # ExperimentConfig
│ └── run.py # `Experiment.from_yaml("config.yaml").run()`
└── .evsys/ # gitignored runtime mirror + outputs
First, classify the working dir.
ls -A | head -20
data/, no training/, no experiments/,
only .git/, README.md, pyproject.toml, etc.) → Bootstrap path.training/run_*.py, a custom extension package under
training/, loose data/*.json(l), evals/, results/) →
Migration path.If unsure, ask the user. Do not make destructive changes until you've confirmed which path.
evsys init-project . --name <project_name>
# or, if scaffolding into a separate dir:
evsys init-project <path> --name <project_name>
The CLI refuses non-empty dirs unless you pass --force. --force only
fills in missing files — it never overwrites a file the user already wrote.uv-managed .venv and install the project into it:
uv venv # creates .venv/ using requires-python from pyproject.toml
uv pip install -e . # installs the project (pulls in evsys-sdk) editable
uv venv is idempotent — if .venv/ already exists it leaves it alone, so
it's safe to run on a repo that's already set up. evsys init-project
already gitignores .venv/, so nothing to add there.
Tell the user to activate it with source .venv/bin/activate (or just prefix
commands with uv run, e.g. uv run evsys new-experiment ...).data/ lineage convention (raw → fetch → process → datasets//v/)src/{verifiers,metrics,transforms}.py with commented examplespyproject.toml declaring src as the importable package.gitignore (.evsys/, data/raw/, .venv/)evsys new-experiment first_check
Tell the user to edit the generated config.yaml, then
python experiments/<dir>/run.py (or uv run python experiments/<dir>/run.py).evsys-sdk/docs/DESIGN.md — layout rationale.using-evsys-sdk skill — how Experiment.from_yaml(...).run() works
end-to-end (dashboard records, sweep expansion, eval, conclusion).Aim for a small, reviewable migration. Never delete user files without
confirmation. Use git mv for everything you can so history is preserved.
Survey the repo and propose mappings — present the full list to the user before touching anything.
| Current shape | Target |
|---|---|
One-off sweep / training scripts (e.g. run_*.py, train_*.py at the repo root or under training/) | one experiments/<yyyymmdd>_<slug>/{config.yaml,run.py} per script — extract the inline hypothesis + hyperparameters into config.yaml, replace the per-arm Python loop with a matrix: block, leave the OOP entrypoint in run.py. |
Project-specific Python helpers (a verifiers.py, metrics.py, transforms.py, or a custom extension package) | move the relevant module(s) into src/{verifiers,metrics,transforms}.py; merge any existing extension-package __init__.py into src/__init__.py. |
Custom metric-backfill scripts (e.g. backfill_*.py) | delete — Experiment auto-forwards metrics.jsonl step rows to the dashboard. |
| Custom batch-generate / score helpers | folded into Benchmark.score() — drop. |
Standalone eval scripts (e.g. eval_*.py) | replace with a metadata.benchmark block in config.yaml + the SDK's Benchmark. |
Flat JSON eval set (e.g. data/<name>.json) | data/benchmark/<name>/tasks.jsonl (harbor-format JSONL — one task per line) + metadata.yaml; convert via a small data/process/<name>_to_harbor.py script. |
Loose training-data JSONL (e.g. data/<name>_v<N>.jsonl) | data/datasets/<name>/v<N>/train.jsonl + metadata.yaml (source, parent version, row count). |
output/, checkpoints/, scattered log dirs | .evsys/ (gitignored). |
analysis/, notebooks/ | leave in place; not part of the layout. |
Run evsys init-project . --force to fill in any missing standard files
without clobbering existing ones. Then create the data subdirs that didn't
exist yet (data/datasets/<name>/v1/, data/benchmark/<name>/).
If the project has no uv-managed environment yet, create one and install it
editable — uv venv && uv pip install -e .. uv venv won't disturb an
existing .venv/, and uv pip install -e . reconciles the project's existing
pyproject.toml dependencies (which the migration preserves). Make sure
.venv/ is gitignored.
For each mapping, propose the git mv (or write a small conversion script
when the target shape differs from the source). Pause for user confirmation
on anything that:
pyproject.toml (existing dependencies must be preserved).After moves:
<projectname>_ext.verifiers) → import from src instead
(from src.verifiers import … or import src at the top of run.py
to fire the registration decorators).backfill_* helper → delete the call;
Experiment forwards metrics.jsonl automatically.from evsys_sdk import … for the OOP path
(Experiment, Sweep, Benchmark) is now top-level — no
from evsys_sdk.experiment import Experiment needed.Pick one prior training script and port it end-to-end to the new layout:
evsys new-experiment <slug_matching_old_script>.config.yaml's metadata + matrix: blocks.run.py to Experiment.from_yaml("config.yaml").run().evsys benchmark upload data/benchmark/<name> — paste the printed id
into config.yaml's metadata.benchmark.id.backend.kind: mock) to
confirm the wiring works without spending compute.Only after that smoke succeeds do you propose porting the remaining scripts — one at a time, each as its own PR if possible.
git rm or rm a file without explicit user confirmation.pyproject.toml, README.md,
config files) — scaffold around them.git mv over plain mv.backend.kind: mock, get a passing smoke run, then enable the real
backend.evsys new-experiment <slug> for every subsequent experiment.evsys benchmark upload data/benchmark/<name> whenever a benchmark
(the TEST set, scored after training) changes content (idempotent
re-upload returns "unchanged").metadata.benchmark entry tagged [val] with run_every: <N> — it's
scored every N steps off the live model. Keep the final TEST benchmark
a separate entry (no run_every, tagged [test]) so model selection
never keys off the test set.using-evsys-sdk skill for the day-to-day patterns
(Experiment.from_yaml(...).run(), sweep / matrix syntax, scoring).getting-experiment-context skill if they want to recall prior results
before designing a new experiment.npx claudepluginhub ev-sys/evsys-sdk --plugin evsys-sdkOrganizes ML experimentation projects with a standard layout: reusable code in src/, one #%% script per experiment in experiments/, design notes in journal/, reports in reports/, and agent-only probes in scratch/. Enforces file-creation rules and the jupytext #%% script convention.
Organizes research project directories with a standard scientific skeleton. Use when scaffolding new projects, restructuring existing codebases, or enforcing DVC-tracked data pipelines for reproducible experiments.
Teaches how to use evsys-sdk to read project goals, experiment history, and to create/launch experiments via EvsysStore and Workspace.