From qult
TDD methodology for creating and testing qult skills. Use when creating new skills, editing existing skills, or verifying skills work correctly. Applies test-driven development to process documentation. NOT for editing non-skill files.
npx claudepluginhub hir4ta/qult --plugin qultThis skill uses the workspace's default tool permissions.
Test-Driven Development applied to skill creation.
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Guides systematic root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
Test-Driven Development applied to skill creation.
Quality by Structure, Not by Promise. A skill that hasn't been tested is a skill that doesn't work. Writing skills IS Test-Driven Development applied to process documentation.
Before writing ANY skill content, observe what happens WITHOUT the skill:
This is your RED test. The skill must fix these failure patterns.
## Baseline Test: [skill-name]
### Task: [specific task description]
### Without skill:
- [ ] Agent skipped [step]
- [ ] Agent assumed [thing] without asking
- [ ] Agent produced [bad output]
### Expected with skill:
- [ ] Agent performs [step]
- [ ] Agent asks architect about [thing]
- [ ] Agent produces [good output]
Now write the skill to fix the observed failures:
---
name: [kebab-case, 1-64 chars]
description: "[WHAT it does]. [WHEN to use]. NOT for [exclusions]."
user-invocable: true # if architect can invoke directly
---
# /qult:[name]
One-line purpose statement.
> **[qult philosophy quote]**
## The Wall (if enforcement needed)
<HARD-GATE>...</HARD-GATE>
## Process
[Numbered steps, clear and specific]
## Rationalization Prevention
[Table of excuses and rebuttals]
## Red Flags
[Self-check list]
Description = WHAT + WHEN + NOT: The description is what agents use to decide whether to load the skill. Be specific.
Do NOT summarize the workflow in the description: Agents will read the description and skip the body. Keep the description focused on WHEN to use, not HOW it works.
Process = specific, testable steps: Each step should produce an observable outcome.
Rationalization Prevention = preemptive defense: List the excuses agents WILL make, and counter each one.
Use qult terminology: "architect" (not "user"), "The Wall" (for enforcement), "Proof or Block" (for verification).
Token efficiency: Keep the getting-started section under 150 words. Details go in later sections.
No @ imports: They consume 200k+ context tokens immediately.
## Iteration Test: [skill-name] v[N]
### Same task as baseline
### With skill v[N]:
- [x] Agent performs [step] ← FIXED
- [ ] Agent found loophole: [description] ← NEW FAILURE
### Fix: Added rationalization prevention for [loophole]
plugin/skills/[name]/SKILL.md| Type | Example | Test Strategy |
|---|---|---|
| Discipline-Enforcing | TDD, debugging | Subagent without skill should skip discipline; with skill should follow it |
| Workflow | explore, finish | Subagent should follow exact step sequence |
| Pattern | code review | Subagent output should match expected format |
| Reference | API docs | Subagent should use correct APIs/patterns |
| You might think... | Reality |
|---|---|
| "This skill is too simple to test" | Simple skills hide simple failure modes. Test takes 30 seconds. |
| "I know skills well enough to write one correctly" | You know the CONCEPT. The agent will find loopholes you didn't imagine. |
| "Testing skills is overhead" | An untested skill that agents ignore is pure waste. |
| "I'll test after writing" | Tests-after prove "what does this do?" Tests-first prove "what SHOULD this do?" |
| "The agent will just follow instructions" | Research shows LLM instruction-following is context-dependent. Verify. |
| "I can iterate in production" | Production iteration = real tasks with real mistakes. Test first. |