Skill

Set up research project

Scaffolds or migrates a repo to the evsys-sdk research-project layout (data/, src/, experiments/, .evsys/). Use when starting a new project or bringing an ad-hoc project into the standard shape.

Python

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/evsys-sdk:set-up-research-project

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill stands up a new repo against the evsys-sdk research-project

SKILL.md

197 lines · ~2.5k tokens

Stats

LanguagePython

Stars27

Forks2

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

Set up research project

This skill stands up a new repo against the evsys-sdk research-project layout, or migrates an existing one into it. Use it when:

The user is starting a new project that will use evsys-sdk.
The user has an existing ad-hoc project (loose training/ scripts, scattered data/*.json(l) files, custom helper packages, ad-hoc output/ or checkpoints/ dirs, no experiments/ or src/ structure) and wants to bring it into the standard shape.

The target layout is the one documented in evsys-sdk/docs/DESIGN.md → "Researcher-project layout":

<project>/
├── pyproject.toml                  # declares src/ as importable
├── README.md
├── data/
│   ├── raw/                        # untouched source dumps (gitignored)
│   ├── fetch/                      # python scripts that populate raw/
│   ├── process/                    # raw → datasets/<name>/v<N>/
│   ├── datasets/<name>/v1/{train,test}.jsonl + metadata.yaml
│   └── benchmark/<name>/tasks.jsonl + metadata.yaml [+ images/ + raw/]
├── src/                        # project-specific SDK extensions
│   ├── __init__.py                 # imports verifiers/metrics/transforms
│   ├── verifiers.py
│   ├── metrics.py
│   └── transforms.py
├── experiments/<yyyymmdd>_<slug>/
│   ├── config.yaml                 # ExperimentConfig
│   └── run.py                      # `Experiment.from_yaml("config.yaml").run()`
└── .evsys/                    # gitignored runtime mirror + outputs

Decide: new or existing?

First, classify the working dir.

ls -A | head -20

New / empty repo (no data/, no training/, no experiments/, only .git/, README.md, pyproject.toml, etc.) → Bootstrap path.
Existing project with training scripts or data (you see anything like training/run_*.py, a custom extension package under training/, loose data/*.json(l), evals/, results/) → Migration path.

If unsure, ask the user. Do not make destructive changes until you've confirmed which path.

Bootstrap path (new / empty repo)

Confirm the project name with the user (default to the directory basename).
Run:
```
evsys init-project . --name <project_name>
# or, if scaffolding into a separate dir:
evsys init-project <path> --name <project_name>
```
The CLI refuses non-empty dirs unless you pass --force. --force only fills in missing files — it never overwrites a file the user already wrote.
Create a uv-managed .venv and install the project into it:
```
uv venv                    # creates .venv/ using requires-python from pyproject.toml
uv pip install -e .        # installs the project (pulls in evsys-sdk) editable
```
uv venv is idempotent — if .venv/ already exists it leaves it alone, so it's safe to run on a repo that's already set up. evsys init-project already gitignores .venv/, so nothing to add there. Tell the user to activate it with source .venv/bin/activate (or just prefix commands with uv run, e.g. uv run evsys new-experiment ...).
Walk the user through what landed:
- data/ lineage convention (raw → fetch → process → datasets//v/)
- src/{verifiers,metrics,transforms}.py with commented examples
- pyproject.toml declaring src as the importable package
- .gitignore (.evsys/, data/raw/, .venv/)
Show how to add the first experiment:
```
evsys new-experiment first_check
```
Tell the user to edit the generated config.yaml, then python experiments/<dir>/run.py (or uv run python experiments/<dir>/run.py).
Point them at:
- evsys-sdk/docs/DESIGN.md — layout rationale.
- using-evsys-sdk skill — how Experiment.from_yaml(...).run() works end-to-end (dashboard records, sweep expansion, eval, conclusion).

Migration path (existing project)

Aim for a small, reviewable migration. Never delete user files without confirmation. Use git mv for everything you can so history is preserved.

Step 1: map current → target

Survey the repo and propose mappings — present the full list to the user before touching anything.

Current shape	Target
One-off sweep / training scripts (e.g. `run_.py`, `train_.py` at the repo root or under `training/`)	one `experiments/<yyyymmdd>_<slug>/{config.yaml,run.py}` per script — extract the inline hypothesis + hyperparameters into `config.yaml`, replace the per-arm Python loop with a `matrix:` block, leave the OOP entrypoint in `run.py`.
Project-specific Python helpers (a `verifiers.py`, `metrics.py`, `transforms.py`, or a custom extension package)	move the relevant module(s) into `src/{verifiers,metrics,transforms}.py`; merge any existing extension-package `__init__.py` into `src/__init__.py`.
Custom metric-backfill scripts (e.g. `backfill_*.py`)	delete — `Experiment` auto-forwards `metrics.jsonl` step rows to the dashboard.
Custom batch-generate / score helpers	folded into `Benchmark.score()` — drop.
Standalone eval scripts (e.g. `eval_*.py`)	replace with a `metadata.benchmark` block in `config.yaml` + the SDK's `Benchmark`.
Flat JSON eval set (e.g. `data/<name>.json`)	`data/benchmark/<name>/tasks.jsonl` (harbor-format JSONL — one task per line) + `metadata.yaml`; convert via a small `data/process/<name>_to_harbor.py` script.
Loose training-data JSONL (e.g. `data/<name>_v<N>.jsonl`)	`data/datasets/<name>/v<N>/train.jsonl` + `metadata.yaml` (source, parent version, row count).
`output/`, `checkpoints/`, scattered log dirs	`.evsys/` (gitignored).
`analysis/`, `notebooks/`	leave in place; not part of the layout.

Step 2: scaffold the target dirs

Run evsys init-project . --force to fill in any missing standard files without clobbering existing ones. Then create the data subdirs that didn't exist yet (data/datasets/<name>/v1/, data/benchmark/<name>/).

If the project has no uv-managed environment yet, create one and install it editable — uv venv && uv pip install -e .. uv venv won't disturb an existing .venv/, and uv pip install -e . reconciles the project's existing pyproject.toml dependencies (which the migration preserves). Make sure .venv/ is gitignored.

Step 3: move files

For each mapping, propose the git mv (or write a small conversion script when the target shape differs from the source). Pause for user confirmation on anything that:

deletes any user file (especially custom helper scripts the migration table marks as obsolete),
converts a JSON eval set into harbor JSONL (verify a couple of rows round-trip correctly first),
touches pyproject.toml (existing dependencies must be preserved).

Step 4: rewrite imports

After moves:

Files that imported the prior extension package (e.g. <projectname>_ext.verifiers) → import from src instead (from src.verifiers import … or import src at the top of run.py to fire the registration decorators).
Scripts that called a backfill_* helper → delete the call; Experiment forwards metrics.jsonl automatically.
from evsys_sdk import … for the OOP path (Experiment, Sweep, Benchmark) is now top-level — no from evsys_sdk.experiment import Experiment needed.

Step 5: rebuild one experiment as a smoke test

Pick one prior training script and port it end-to-end to the new layout:

evsys new-experiment <slug_matching_old_script>.
Translate the script's hypothesis / hyperparameters / sweep axis into the new config.yaml's metadata + matrix: blocks.
Reduce run.py to Experiment.from_yaml("config.yaml").run().
Upload any benchmark it referenced: evsys benchmark upload data/benchmark/<name> — paste the printed id into config.yaml's metadata.benchmark.id.
Run it with the mock backend first (set backend.kind: mock) to confirm the wiring works without spending compute.

Only after that smoke succeeds do you propose porting the remaining scripts — one at a time, each as its own PR if possible.

Hard rules

Never git rm or rm a file without explicit user confirmation.
Never overwrite a user-authored file (pyproject.toml, README.md, config files) — scaffold around them.
Preserve git history — prefer git mv over plain mv.
One migration per PR if possible — don't bundle "port script A" and "convert benchmark B" and "delete utility C" into one mass move.
Mock first, real backend second — port one experiment with backend.kind: mock, get a passing smoke run, then enable the real backend.

What to point the user at after you're done

evsys new-experiment <slug> for every subsequent experiment.
evsys benchmark upload data/benchmark/<name> whenever a benchmark (the TEST set, scored after training) changes content (idempotent re-upload returns "unchanged").
For an in-loop VALIDATION signal during training, add a metadata.benchmark entry tagged [val] with run_every: <N> — it's scored every N steps off the live model. Keep the final TEST benchmark a separate entry (no run_every, tagged [test]) so model selection never keys off the test set.
using-evsys-sdk skill for the day-to-day patterns (Experiment.from_yaml(...).run(), sweep / matrix syntax, scoring).
getting-experiment-context skill if they want to recall prior results before designing a new experiment.

Set up research project

Popularity

Invocation

Context Preview

SKILL.md

Set up research project

Popularity

Invocation

Context Preview

SKILL.md

Set up research project

Decide: new or existing?

Bootstrap path (new / empty repo)

Migration path (existing project)

Step 1: map current → target

Step 2: scaffold the target dirs

Step 3: move files

Step 4: rewrite imports

Step 5: rebuild one experiment as a smoke test

Hard rules

What to point the user at after you're done

Similar Skills

Set up research project

Decide: new or existing?

Bootstrap path (new / empty repo)

Migration path (existing project)

Step 1: map current → target

Step 2: scaffold the target dirs

Step 3: move files

Step 4: rewrite imports

Step 5: rebuild one experiment as a smoke test

Hard rules

What to point the user at after you're done

Similar Skills