From experiment-design
Use this skill when designing ML/AI experiments, evaluation protocols, or research benchmarks. Guides hypothesis specification, baseline selection, metric choice, and experimental controls to ensure results are valid and reproducible.
npx claudepluginhub aviskaar/open-org --plugin experiment-design# Experiment Design Design rigorous machine learning experiments that produce credible, reproducible results. ## Design Checklist Work through each section before writing any code. ### 1. Hypothesis State the hypothesis as a falsifiable claim: > "We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]." If the hypothesis is vague, help the user sharpen it before proceeding. ### 2. Independent and Dependent Variables - **Independent variable:** What is being changed (e.g., architecture, loss function, data augmentation)? - **Dependent variable:** What is being m...
/SKILLGuides implementation of defense-in-depth security architectures, compliance (SOC2, ISO27001, GDPR, HIPAA), threat modeling, risk assessments, SecOps, incident response, and SDLC security integration.
/SKILLEvaluates LLMs on 60+ benchmarks (MMLU, HumanEval, GSM8K) using lm-eval harness. Provides CLI commands for HuggingFace/vLLM models, task lists, and evaluation checklists.
/SKILLApplies systematic debugging strategies to track down bugs, performance issues, and unexpected behavior using checklists, scientific method, and testing techniques.
/SKILLSummarizes content from URLs, local files, podcasts, and YouTube videos. Extracts transcripts with --extract-only flag. Supports AI models, lengths, and JSON output.
/SKILLRuns `yarn extract-errors` on React project to detect new error messages needing codes, reports them, and verifies existing codes are up to date.
/SKILLManages major dependency upgrades via compatibility analysis, staged rollouts with npm/yarn, and testing for frameworks like React.
Design rigorous machine learning experiments that produce credible, reproducible results.
Work through each section before writing any code.
State the hypothesis as a falsifiable claim:
"We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."
If the hypothesis is vague, help the user sharpen it before proceeding.
Select baselines at three levels:
Justify each choice. Avoid strawman baselines.
State the hardware, estimated runtime, and number of seeds. This enables reproducibility and contextualizes cost.
Design ablations that isolate each component's contribution. Each ablation should remove or replace exactly one thing.
Identify at least two ways the experiment could give misleading results, and how to detect or mitigate them.
Produce a structured experiment plan as a markdown document with all sections above filled in. Highlight any section where the user needs to make a decision before proceeding.