Help us improve
Share bugs, ideas, or general feedback.
From science-superpowers
Guides the creation, editing, and validation of SKILL.md files using empirical methods: baseline observation, hypothesis formation, and testing.
npx claudepluginhub k-dense-ai/science-superpowers --plugin science-superpowersHow this skill is triggered — by the user, by Claude, or both
Slash command
/science-superpowers:writing-science-skillsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Writing a skill is an experiment, not an essay.** You form a hypothesis (this guidance will change agent behavior), run a baseline (watch agents fail without it), write the skill, and test that behavior changes.
Uses test-driven development to create, edit, and verify skills for agent behavior. Write pressure scenarios, watch agents fail, then document rules to fix behavior.
Creates and validates skills using test-driven development with pressure scenarios and subagents. Use when authoring new skills, editing existing ones, or verifying skill behavior.
Teaches TDD for SKILL.md files: baseline agent failure tests, minimal docs to fix violations, verify compliance, refactor cycles. Use when creating, editing, or reviewing skills.
Share bugs, ideas, or general feedback.
Writing a skill is an experiment, not an essay. You form a hypothesis (this guidance will change agent behavior), run a baseline (watch agents fail without it), write the skill, and test that behavior changes.
Core principle: If you didn't watch an agent fail without the skill, you don't know whether the skill teaches the right thing — the same reason you pre-register before analyzing rather than rationalizing a result after.
REQUIRED BACKGROUND: Understand science-superpowers:preregistering-analysis first. The discipline of fixing a prediction before observing the outcome is exactly the discipline here: observe baseline behavior before writing the skill, so the skill is shaped by evidence, not by what you imagine agents do.
A skill is a reference guide for a proven technique, pattern, or discipline that future agents can find and apply.
Skills are: reusable techniques, patterns, disciplines, reference guides.
Skills are NOT: narratives about how you did something once.
| Experiment concept | Skill creation |
|---|---|
| Hypothesis | "This guidance will fix behavior X" |
| Baseline / control | Agent behavior WITHOUT the skill |
| Observed failure (the signal) | The exact rationalizations agents use |
| Intervention | The skill document |
| Effect | Agent now complies |
| Replication | Re-test under different pressures until robust |
Create when:
Don't create for:
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only for heavy reference or reusable tools
Flat namespace. Keep principles and short patterns inline; split out only heavy reference (100+ lines) or reusable tools/templates.
Frontmatter (YAML): two required fields, name and description, max 1024 characters total.
name: letters, numbers, hyphens onlydescription: third person, describes ONLY WHEN to use (not what it does); start with "Use when..."This is the single most important authoring rule. If the description summarizes the workflow, the agent follows the summary and skips the skill body.
# BAD: summarizes workflow — agent follows this instead of reading the skill
description: Use when analyzing data - pre-register, run the test, check assumptions, then report
# GOOD: triggering conditions only
description: Use before running any confirmatory analysis or looking at outcome data
Use concrete triggers and symptoms. Describe the problem (a convenient result, a surprising number), not just the topic. Keep it technology-agnostic unless the skill is technology-specific.
Future agents must FIND your skill.
investigating-anomalous-results not anomaly-utils; gerunds work well for processesUse the skill name with an explicit requirement marker:
**REQUIRED SUB-SKILL:** Use science-superpowers:preregistering-analysisSee skills/.../SKILL.md (unclear if required)@skills/.../SKILL.md (force-loads, burns context)Never use @ links to other skills — they load immediately and consume context before it's needed.
Use a small graphviz dot flowchart ONLY for a non-obvious decision point or a loop where the agent might stop too early. Never for reference material (use tables/lists), code (use code blocks), or linear steps (use numbered lists). Keep node labels semantic.
One excellent, complete, runnable example beats five mediocre ones. For a computational-science framework, prefer Python for data work and shell for environment/pipeline work. Comment WHY, not what. Don't write fill-in-the-blank templates.
NO SKILL WITHOUT AN OBSERVED BASELINE FAILURE FIRST
Applies to NEW skills AND EDITS. Wrote the skill before observing the baseline? You're guessing at what agents need. Delete it, run the baseline, start over.
No exceptions — not for "simple additions", not for "just a section", not for "documentation updates". Don't keep untested changes as "reference".
Different types need different tests. Dispatch fresh subagents (no shared context) and observe.
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to another agent. Test it. |
| "It's just a reference" | References have gaps. Test retrieval. |
| "No time to test" | Deploying an untested skill costs more time later. |
| "Academic review is enough" | Reading ≠ using. Test application. |
RED:
GREEN:
name uses only letters/numbers/hyphensdescription starts with "Use when", third person, triggers only, no workflow summaryREFACTOR:
Quality:
science-superpowers:<name>, no @ linksscripts/bump-version.sh --check if you touched a manifestAfter writing ANY skill, STOP and complete its testing before starting another. Don't batch. Deploying an untested skill is deploying an untested intervention.
Creating a skill IS running an experiment on agent behavior. Same discipline as pre-registration: observe the baseline before you intervene, and let evidence — not your imagination — shape the result.