From calkit
Defines Calkit conventions for research projects: calkit.yaml structure, environments (uv-venv, conda, docker, renv), pipeline stages, CLI commands, version control. Loads automatically for Calkit projects.
How this skill is triggered — by the user, by Claude, or both
Slash command
/calkit:conventionsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Calkit is a tool for research project management focused on automation and
Calkit is a tool for research project management focused on automation and
reproducibility—enabling continuous delivery for research.
It provides a unified
interface over Git (source control) and DVC (data versioning), and adds
environment management and pipeline orchestration on top. The central artifact
is calkit.yaml, the project's metadata database.
calkit.yaml filecalkit.yaml lives at the repo root and contains:
environments—computational environments (Python venvs, Conda, Docker,
R, Julia, MATLAB, etc.)pipeline.stages—the reproducible pipelinenotebooks—registered Jupyter notebooksdatasets, figures, publications—versioned project outputsprocedures, calculations, references—supporting metadatashowcase—elements shown on the project's Calkit Cloud homepageA minimal example:
environments:
main:
kind: uv-venv
path: requirements.txt
python: "3.13"
pipeline:
stages:
process-data:
kind: python-script
script_path: scripts/process.py
environment: main
inputs:
- data/raw.csv
outputs:
- data/processed.csv
Every pipeline stage must reference a named environment defined in
environments. Calkit enforces this to ensure reproducibility. Supported
kinds:
| Kind | Spec file |
|---|---|
uv-venv | requirements.txt or pyproject.toml |
venv | requirements.txt |
conda | environment.yml |
pixi | pixi.toml |
docker | (image name, no local spec file needed) |
renv | renv.lock |
julia | Project.toml |
matlab | (no spec file) |
ssh | (remote host config) |
slurm | (HPC cluster config) |
Example environment definitions:
environments:
main:
kind: uv-venv
path: requirements.txt
python: "3.13"
texlive:
kind: docker
image: texlive/texlive:latest-full
r-env:
kind: renv
path: renv.lock
Calkit generates a lock file for each environment under .calkit/env-locks/
and uses it as a DVC dependency. If the environment changes, affected stages
are automatically flagged for re-run.
Stages live under pipeline.stages in calkit.yaml. Every stage requires
kind and environment. Most stages also declare inputs and outputs.
| Parameter | Type | Notes |
|---|---|---|
kind | string | Required. See stage kinds below. |
environment | string | Required. Must match a key in environments. |
inputs | list | Files this stage reads. Changes trigger re-run. |
outputs | list | Files this stage writes. Stored in Git or DVC. |
wdir | string | Working directory (relative to repo root). |
always_run | bool | Force re-run even if nothing changed. |
iterate_over | list | Parameterize the stage over a list of values. |
description | string | Human-readable description. |
python-script: Run a Python script
kind: python-script
script_path: scripts/run.py
args: ["--flag", "value"] # optional
jupyter-notebook: Execute a Jupyter notebook
kind: jupyter-notebook
notebook_path: notebooks/analysis.ipynb
html_storage: git # optional, default: dvc
executed_ipynb_storage: git # optional, default: dvc
parameters: { key: value } # optional, papermill parameters
shell-command: Run an arbitrary shell command
kind: shell-command
command: "python -m mymodule --arg val"
shell: bash # optional, default: bash
shell-script: Run a shell script file
kind: shell-script
script_path: scripts/run.sh
latex: Compile a LaTeX document to PDF
kind: latex
target_path: paper/paper.tex
pdf_storage: git # optional, default: dvc
r-script: Run an R script
kind: r-script
script_path: scripts/analysis.R
julia-script / julia-command: Run Julia code
kind: julia-script
script_path: scripts/run.jl
matlab-script / matlab-command: Run MATLAB code
kind: matlab-script
script_path: scripts/run.m
docker-command: Run a command inside a Docker container
kind: docker-command
command: "docker run --rm myimage mycommand"
command: Generic command (for tools that don't fit other kinds)
kind: command
command: "mytool --input data/raw.csv --output data/out.csv"
By default, outputs are stored with DVC (large file storage). Use
storage: git for small files that belong in version control:
outputs:
- data/processed.csv # DVC (default)
- path: data/meta.json
storage: git # committed to Git
- path: results/summary.txt
storage: git
delete_before_run: false # don't delete before re-running
Use from_stage_outputs to declare that a stage depends on another stage's
outputs rather than listing individual files:
stages:
collect-data:
kind: python-script
script_path: scripts/collect.py
environment: main
outputs:
- data/raw.csv
process-data:
kind: python-script
script_path: scripts/process.py
environment: main
inputs:
- from_stage_outputs: collect-data
outputs:
- data/processed.csv
stages:
train-model:
kind: python-script
script_path: scripts/train.py
environment: main
args:
- "--model={model}"
iterate_over:
- arg_name: model
values:
- linear-regression
- random-forest
inputs:
- data/processed.csv
outputs:
- models/{model}.pkl
Calkit compiles calkit.yaml into dvc.yaml when calkit run is called.
Do not edit dvc.yaml directly—it is a generated file. The authoritative
pipeline definition is always calkit.yaml.
DVC handles:
calkit push / calkit pull| Command | What it does |
|---|---|
calkit run | Run the pipeline (skips unchanged stages) |
calkit run --force | Force re-run all stages |
calkit status | Show which stages are stale |
calkit xr <file> | Auto-detect stage type, env, I/O, add to pipeline |
calkit xenv -n <env> -- <cmd> | Run a command in a named environment |
calkit push | Push Git commits and DVC-tracked files to remotes |
calkit pull | Pull latest code and data |
calkit save | Auto-add, commit, and push (Git + DVC) |
calkit commit -m "msg" | Commit all tracked changes (Git + DVC) |
calkit add <file> | Add a file to version control |
calkit check env --name <env> | Verify an environment matches its spec |
calkit new | Create new project objects (notebook, dataset, etc.) |
calkit xr: The fastest path to a reproducible stagexr ("execute and record") is the recommended way to add scripts and
notebooks to the pipeline for the first time. It:
.py, .ipynb, .R,
.jl, .m, .sh, .tex)calkit.yaml and dvc.yamlcalkit xr scripts/run.py # Python script
calkit xr notebooks/analysis.ipynb # Jupyter notebook
calkit xr paper/paper.tex # LaTeX document
calkit xr scripts/run.R # R script
calkit xr scripts/run.py --input data/raw.csv --output results/out.csv
calkit xr scripts/run.py --environment main
calkit xr scripts/run.py --dry-run # see what would happen without running
.calkit/env-locks/): committed to Git, act as DVC dependenciesdvc.yaml: generated by Calkit—don't edit manually.calkit/: Calkit's internal directory—commit its contents unless they
are large generated filesnpx claudepluginhub calkit/calkit --plugin calkitConverts repos with ad hoc scripts/notebooks into reproducible Calkit pipelines using calkit init, xr for I/O detection, and YAML stages/environments for Python/R/Julia/etc.
Orchestrates full research pipeline from Brainstorming to Reporting via Planning, Implementation, Testing & Visualization phases with user checkpoints. Configurable for physics, AI/ML, statistics, math domains, depth, and agent personas.
Bootstraps rekal memory by scanning a codebase for architecture, conventions, dependencies, and workflows. Use when starting rekal on a new project or re-initializing memory.