From calkit
Converts repos with ad hoc scripts/notebooks into reproducible Calkit pipelines using calkit init, xr for I/O detection, and YAML stages/environments for Python/R/Julia/etc.
How this skill is triggered — by the user, by Claude, or both
Slash command
/calkit:create-pipelineThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert an existing repo with ad hoc scripts and manual steps into a fully
Convert an existing repo with ad hoc scripts and manual steps into a fully
reproducible Calkit pipeline. When complete, calkit run should reproduce all
important outputs from scratch.
Before writing any YAML, map out what's already there. Start with the README. Users will typically write manual environment creation steps, and lists of script and commands to run in order. This is like a manual pipeline. Next:
scripts/, notebooks/, src/,
and the repo root for .py, .R, .jl, .m, .ipynb files.Ask the user if the order or dependencies are unclear. Do not guess at data flow.
If there is no calkit.yaml, run:
calkit init
This sets up Git (if needed) and DVC. If calkit.yaml already exists but has
no pipeline section, skip this—you will add one.
Every stage must reference a named environment. Identify what's needed:
requirements.txt or pyproject.toml → uv-venv;
environment.yml → condarenv.lock → renvProject.toml → juliadocker with texlive/texlive:latest-fullmatlabAdd environments to calkit.yaml:
environments:
main:
kind: uv-venv
path: requirements.txt
python: "3.13"
Name a single Python environment main. With multiple environments, use
descriptive names (e.g., analysis, paper).
calkit xr firstFor each script or notebook, try xr before writing YAML by hand. It
auto-detects stage kind, environment, and I/O:
calkit xr scripts/collect-data.py --dry-run # preview first
calkit xr scripts/collect-data.py # run for real
Work through scripts in dependency order. After each call, verify the new
stage in calkit.yaml. Override detected I/O if needed:
calkit xr scripts/train.py \
--input data/processed.csv \
--input config/params.yaml \
--output models/model.pkl
xr is not magic—sometimes it won't get everything right. But it handles
most of the boilerplate, so you can focus on verifying correctness and filling
in gaps. Scan through scripts for missing inputs, outputs, environment
dependencies, etc., and add those manually in calkit.yaml as needed.
xr isn't suitableWrite stages directly in calkit.yaml when:
always_run, iterate_over, storage modes, etc.xr does not support the stage kindpipeline:
stages:
collect-data:
kind: python-script
script_path: scripts/collect-data.py
environment: main
outputs:
- data/raw.csv
process-data:
kind: jupyter-notebook
notebook_path: notebooks/process.ipynb
environment: main
inputs:
- data/raw.csv
outputs:
- data/processed.csv
- figures/fig1.png
build-paper:
kind: latex
target_path: paper/paper.tex
environment: texlive
inputs:
- figures/fig1.png
- references.bib
outputs: []
Rules:
from_stage_outputs: stage-name when a stage consumes all outputs of a
prior stage.For each output:
outputs:
- data/raw.csv # DVC (default)
- path: data/meta.json
storage: git
- path: paper/paper.pdf
storage: git
When in doubt, ask the user. Storage mode affects whether collaborators can
see the file without calkit pull.
calkit run
Common errors:
calkit check env --name <env> to diagnoseRun a single stage: calkit run <stage-name>
Force re-run everything: calkit run --force
calkit commit -m "Add reproducible pipeline"
process.py reads data/raw.csv but
that file isn't declared as an output of collect-data, DVC won't track the
dependency.wdir if set).dvc.yaml directly: always edit calkit.yaml—dvc.yaml is
regenerated by Calkit.environment: every stage needs one. Use _system for stages
that only use system tools.npx claudepluginhub calkit/calkit --plugin calkitDefines Calkit conventions for research projects: calkit.yaml structure, environments (uv-venv, conda, docker, renv), pipeline stages, CLI commands, version control. Loads automatically for Calkit projects.
Exports bioinformatics analyses as reproducible bundles: Conda environment, Singularity container, Nextflow pipeline, Snakemake workflow, checksums, and README.
Sets up an isolated, reproducible workspace with pinned environment, fixed seeds, and immutable raw data before running analysis.