Slash Command

/SKILL

Use this skill when designing ML/AI experiments, evaluation protocols, or research benchmarks. Guides hypothesis specification, baseline selection, metric choice, and experimental controls to ensure results are valid and reproducible.

Install

npx claudepluginhub aviskaar/open-org --plugin experiment-design

Prompt Preview

# Experiment Design

Design rigorous machine learning experiments that produce credible, reproducible results.

## Design Checklist

Work through each section before writing any code.

### 1. Hypothesis
State the hypothesis as a falsifiable claim:
> "We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."

If the hypothesis is vague, help the user sharpen it before proceeding.

### 2. Independent and Dependent Variables
- **Independent variable:** What is being changed (e.g., architecture, loss function, data augmentation)?
- **Dependent variable:** What is being m...

Command Content

Other plugins with /SKILL

/SKILL

security-compliance/

Guides implementation of defense-in-depth security architectures, compliance (SOC2, ISO27001, GDPR, HIPAA), threat modeling, risk assessments, SecOps, incident response, and SDLC security integration.

claude-forge

605

/SKILL

evaluating-llms-harness/

Evaluates LLMs on 60+ benchmarks (MMLU, HumanEval, GSM8K) using lm-eval harness. Provides CLI commands for HuggingFace/vLLM models, task lists, and evaluation checklists.

claude-forge

605

/SKILL

debugging-strategies/

Applies systematic debugging strategies to track down bugs, performance issues, and unexpected behavior using checklists, scientific method, and testing techniques.

claude-forge

605

/SKILL

summarize/

Summarizes content from URLs, local files, podcasts, and YouTube videos. Extracts transcripts with --extract-only flag. Supports AI models, lengths, and JSON output.

claude-forge

605

/SKILL

extract-errors/

Runs `yarn extract-errors` on React project to detect new error messages needing codes, reports them, and verifies existing codes are up to date.

claude-forge

605

/SKILL

dependency-upgrade/

Manages major dependency upgrades via compatibility analysis, staged rollouts with npm/yarn, and testing for frameworks like React.

claude-forge

605

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitFeb 20, 2026

Actions

View Source View Plugin View on GitHub View README

Experiment Design

Design rigorous machine learning experiments that produce credible, reproducible results.

Design Checklist

Work through each section before writing any code.

1. Hypothesis

State the hypothesis as a falsifiable claim:

"We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."

If the hypothesis is vague, help the user sharpen it before proceeding.

2. Independent and Dependent Variables

Independent variable: What is being changed (e.g., architecture, loss function, data augmentation)?
Dependent variable: What is being measured (e.g., accuracy, FID score, wall-clock time)?
Controlled variables: List everything held constant.

3. Baselines

Select baselines at three levels:

Naive: A trivially simple method (majority class, mean predictor)
Standard: The most widely-used existing approach
Strong: The current state-of-the-art on the chosen benchmark

Justify each choice. Avoid strawman baselines.

4. Datasets and Splits

Name the dataset, version, and source.
Specify train/val/test splits. Use standard splits if they exist.
Flag any data leakage risks.
Note dataset limitations (bias, domain coverage, size).

5. Metrics

Choose metrics that align with the task objective.
Prefer metrics with established semantics over novel ones.
Report multiple metrics when they capture different aspects.
Specify statistical significance: report means ± standard deviation over N seeds.

6. Compute Budget

State the hardware, estimated runtime, and number of seeds. This enables reproducibility and contextualizes cost.

7. Ablations

Design ablations that isolate each component's contribution. Each ablation should remove or replace exactly one thing.

8. Failure Modes

Identify at least two ways the experiment could give misleading results, and how to detect or mitigate them.

Output Format

Produce a structured experiment plan as a markdown document with all sections above filled in. Highlight any section where the user needs to make a decision before proceeding.