npx claudepluginhub andikarachman/data-science-plugin --plugin dsThis skill uses the workspace's default tool permissions.
Verify that an ML experiment can be reproduced by another person on another machine. Walk through each requirement and score the experiment.
Provides Markdown template and Python utilities for logging ML experiments with hypothesis, configs, results, environment, and decisions for reproducibility. Use when running ML experiments.
Verifies data science analysis results for reproducibility and completion using verification gates, checklists, evidence checks, and Python hooks.
Prepares research code repositories for open-source release by auditing sensitive content, ensuring reproducibility via checklists, suggesting standard structures, and generating publication-ready READMEs.
Share bugs, ideas, or general feedback.
Verify that an ML experiment can be reproduced by another person on another machine. Walk through each requirement and score the experiment.
Verify all sources of randomness are controlled:
random_state parameter or np.random.seed())Check: Search the experiment code for random_state, seed, random.seed, np.random.seed, torch.manual_seed. Every stochastic call should have a fixed seed.
Verify all library versions are captured:
Check: Look for an "Environment" section in the experiment result. Compare library versions against what was planned.
Verify the exact dataset can be retrieved:
Check: Look for data_hash, SHA-256, or a data snapshot reference in the experiment artifacts. Verify the hash matches the actual file if available.
Verify the exact code state can be recovered:
Check: Look for git_commit or git SHA in the experiment result.
Verify the environment can be recreated:
Check: Look for an "Environment" section. If no requirements file exists, flag as a gap.
Verify results are deterministic:
Note: Full re-run verification is optional. Flag if the experiment uses known non-deterministic operations (GPU training without torch.use_deterministic_algorithms(), multi-threaded data loading).
Count the number of checked items across all 6 sections:
| Score | Rating | Recommendation |
|---|---|---|
| 16-17 / 17 | Excellent | Ready to ship |
| 12-15 / 17 | Good | Minor gaps -- document and proceed |
| 8-11 / 17 | Fair | Significant gaps -- fix before shipping |
| 0-7 / 17 | Poor | Not reproducible -- requires rework |
import hashlib
import sys
import subprocess
import importlib
def verify_reproducibility(data_path=None, expected_hash=None):
"""Quick reproducibility verification."""
report = {}
# Python version
report['python'] = sys.version.split()[0]
# Git commit
try:
sha = subprocess.getoutput("git rev-parse HEAD").strip()
dirty = subprocess.getoutput("git status --porcelain").strip()
report['git_commit'] = sha
report['git_clean'] = len(dirty) == 0
except Exception:
report['git_commit'] = 'unavailable'
report['git_clean'] = False
# Library versions
libs = ['pandas', 'numpy', 'sklearn', 'scipy', 'statsmodels',
'aeon', 'xgboost', 'lightgbm', 'matplotlib']
report['libraries'] = {}
for lib in libs:
try:
mod = importlib.import_module(lib)
report['libraries'][lib] = getattr(mod, '__version__', 'installed')
except ImportError:
pass
# Data hash
if data_path:
h = hashlib.sha256()
with open(data_path, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
h.update(chunk)
report['data_hash'] = h.hexdigest()
if expected_hash:
report['data_hash_match'] = report['data_hash'] == expected_hash
return report
| Failure | Cause | Fix |
|---|---|---|
| Different metrics on re-run | Missing random seed in data split or model | Pass random_state to all stochastic calls |
| Can't install same libraries | No pinned versions | Use pip freeze > requirements.txt at experiment time |
| Data changed between runs | No data hash captured | Hash data files before training |
| Code changed since experiment | No git SHA recorded | Record git rev-parse HEAD in experiment log |
| GPU gives different results | Non-deterministic CUDA operations | Document GPU non-determinism or use torch.use_deterministic_algorithms(True) |