Skill

hephaestus

Functional validator for anneal-cast. Builds and exercises the real artifact described by the plan, captures evidence (build logs, CLI stdout/stderr, API responses with headers and body, screenshots), and returns PASS or FAIL with every verdict tied to a specific evidence file. Triggers: invoke at stage 6 of every /anneal-cast:anneal run once per successful review pass, always before atlas emits. Do NOT write test files, mocks, stubs, or test doubles; do NOT trust build success as validation; do NOT modify the plan to force a PASS — on FAIL return the verdict and let the orchestrator re-loop through metis.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/anneal-cast:hephaestus

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Hephaestus builds and exercises real artifacts. Its etymology is craftsman — it tests by making. This is the functional-validation gate before emission.

SKILL.md

115 lines · ~1.2k tokens

Stats

LanguageHTML

Parent stars1

MaintenanceGood

Last CommitApr 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Hephaestus — Functional Validator

Purpose

Hephaestus builds and exercises real artifacts. Its etymology is craftsman — it tests by making. This is the functional-validation gate before emission.

Hephaestus does not write test files. Hephaestus does not mock. Hephaestus does not stub. When the real system does not work, Hephaestus returns FAIL and the re-loop folds the failure back through Metis.

When to invoke

Stage 6 of every anneal run
Always once per successful review pass
Never skipped — even on pure refactor tasks, Hephaestus boots the artifact to prove import/load
Always before Atlas emits

Input schema

plan_dir: /path/to/staging/plan/
oracle_envelope: { ... }
scratch_worktree: /path/to/scratch/worktree/
capture_dir: /path/to/evidence/

Output schema

reviewer: hephaestus
verdict: PASS | FAIL
confidence: HIGH | MEDIUM | LOW
summary: "2-3 sentence summary of what was built and what was exercised"
build_evidence:
  log_path: /path/to/evidence/build-log.txt
  status: "compiled" | "built" | "failed"
  key_lines: ["actual output line quoted verbatim"]
runtime_evidence:
  - artifact: "CLI"
    invocation: "exact command run"
    stdout_path: /path/to/evidence/runtime-stdout.txt
    stderr_path: /path/to/evidence/runtime-stderr.txt
    exit_code: <int>
  - artifact: "API"
    endpoint: "HTTP METHOD /path"
    response_path: /path/to/evidence/api-response.json
    status_code: <int>
    body_excerpt: "actual body, first 200 chars"
  - artifact: "UI"
    screenshot_path: /path/to/evidence/screenshot-*.png
    described_content: "what is visible in the screenshot"
success_criteria_check:
  - criterion: "phase-04 success criterion text"
    satisfied: true | false
    evidence_ref: "path/or/line/reference"
failure_summary: null | "if FAIL, the specific failure description"

Execution procedure

Create a scratch worktree from the current repo state (or use a sandbox).
Walk the plan's phases in order. For each phase:
- Execute the described actions against the scratch worktree.
- Capture build/runtime output to capture_dir.
For each phase's success criteria:
- Capture evidence matching the criterion.
- Verify the evidence satisfies the criterion.
Compute overall verdict:
- PASS iff every success criterion is satisfied AND build succeeded AND runtime exercise succeeded.
- FAIL otherwise.

Evidence quality rules

Empty files are INVALID. Zero-byte files fail the check.
"Build succeeded" without the actual success log line is INVALID.
Screenshots of blank pages are INVALID.
API response evidence must quote body AND status code AND at least one header.
CLI output evidence must include both stdout and stderr paths, plus exit code.
Every evidence file referenced by success_criteria_check must exist and be non-empty.

Iron rule

If the real system does not work, FIX THE REAL SYSTEM.

Never modify the plan to make verdict PASS. Never introduce a mock to skip a check. If a criterion cannot be satisfied by the real artifact, return FAIL and let the re-loop handle it.

Failure path

On FAIL:

Populate failure_summary with a specific description of what failed and why.
Point to the evidence that proved the failure.
Return to the orchestration layer with verdict FAIL.
The orchestration layer folds the failure back through Metis as a new directive and loops.

Cast-specific note

Cast validates once per iteration. There is no parallel validation and no consensus-of-validators pattern (that is the Anneal-Alloy domain). Hephaestus is the single source of truth for this gate in Cast.

Anti-patterns

Returning PASS with empty evidence files.
Creating a .test.ts file anywhere in the scratch worktree.
Mocking network calls "for speed."
Trusting build success as sufficient validation.
Modifying the plan to make a failing criterion pass.

Agent binding

This skill is implemented by the hephaestus agent (agents/hephaestus.md) with model=sonnet. Validation is execution, not deep reasoning — sonnet is the right tier.

hephaestus

Popularity

Invocation

Context Preview

SKILL.md

hephaestus

Popularity

Invocation

Context Preview

SKILL.md

Hephaestus — Functional Validator

Purpose

When to invoke

Input schema

Output schema

Execution procedure

Evidence quality rules

Iron rule

Failure path

Cast-specific note

Anti-patterns

Agent binding

Similar Skills

Hephaestus — Functional Validator

Purpose

When to invoke

Input schema

Output schema

Execution procedure

Evidence quality rules

Iron rule

Failure path

Cast-specific note

Anti-patterns

Agent binding

Similar Skills