From hotl
Executes HOTL workflow Markdown files autonomously: loops steps until success criteria met, auto-approves low-risk gates, manages git branches/worktrees, resumes interrupted runs.
npx claudepluginhub yimwoo/hotl-plugin --plugin hotlThis skill uses the workspace's default tool permissions.
Execute a `hotl-workflow-<slug>.md` file autonomously. Loop on steps with success criteria. Auto-approve low-risk gates. Always pause for high-risk gates.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Execute a hotl-workflow-<slug>.md file autonomously. Loop on steps with success criteria. Auto-approve low-risk gates. Always pause for high-risk gates.
Announce: "Starting HOTL loop execution. Looking for workflow file..."
Resolve which workflow file to execute:
Use $hotl:loop-execution to run hotl-workflow-add-auth.md or /hotl:loop hotl-workflow-add-auth.md) → use that filehotl-workflow*.md in project root:
After resolving the workflow file, check .hotl/state/*.json for interrupted runs matching that workflow:
After resolving the workflow file, run this preflight before executing any steps:
1. Is this a git repo with at least one commit?
- No → log "Skipping branch setup (no git history)" → proceed to step execution
- Yes → continue
2. Check for uncommitted changes
- First, exclude HOTL-owned transient artifacts from the dirty check:
• hotl-workflow-*.md (workflow plan files)
• docs/plans/*-design.md, docs/plans/*-plan.md (design/plan docs from brainstorming)
• .hotl/ (runtime state, reports, cache)
- If only HOTL artifacts are dirty → treat as clean, continue
- If non-HOTL dirty files exist:
• If dirty_worktree: allow in workflow frontmatter → proceed without prompting
• Otherwise → HARD-FAIL. Tell the user which non-HOTL files are dirty. Offer choices:
a. Clean up manually, then re-run
b. Stash manually, then re-run
c. Explicitly approve HOTL to stash and continue
- Clean → continue
3. Determine branch name
- If branch: field exists in workflow frontmatter → use it
- Otherwise → derive hotl/<slug> from hotl-workflow-<slug>.md
4. Check if branch already exists locally
- Exists, same HEAD → ask: reuse, delete+recreate, or abort
- Exists, different HEAD → ask: delete+recreate, or abort
- Does not exist → create (no prompt)
5. Create branch/worktree
- If worktree: true in frontmatter → create git worktree with the branch
- Otherwise → create branch and checkout in current directory
Rules:
scripts/document-lint.sh) automatically on the workflow file before any git mutation or step execution. If lint fails, STOP and show all errors. If lint passes, continue silently.This is the canonical HOTL execution state machine. Other execution modes (e.g., subagent-execution) reference this spec and define only their differences.
1. Resolve workflow file (see above)
2. Parse frontmatter: intent, risk_level, auto_approve, branch, worktree
3. Run Branch/Worktree Preflight (see above)
4. Initialize run via runtime:
- Run: `hotl-rt init <workflow-file>`
- This parses the workflow, creates .hotl/state/<run-id>.json with all steps, and initializes .hotl/reports/<run-id>.md
- Capture the run_id from stdout
- Only after init succeeds should chat output or native plan/progress UI show anything
5. For each step in order:
a. Start step via runtime:
- Run: `hotl-rt step N start`
- This persists step start (status, timestamp, attempts) to state and report
- Only after the runtime call succeeds should chat show "→ Step N"
b. Announce: "→ Step N: [name]"
c. Execute the action (agent implements the work)
d. Verify via runtime:
- Run: `hotl-rt step N verify`
- The runtime runs the verify command, captures stdout/stderr, and atomically transitions the step to done or failed
- If the verify type is unsupported, the runtime blocks the step with a clear reason
- For type: browser — if browser tooling unavailable, downgrade to type: human-review
- For type: human-review — the runtime returns a `human review required: ...` block reason and sets the run status to `paused`; ALWAYS pause for human (never auto-approve)
- For type: artifact — runtime checks path exists and evaluates assert; for `matches-glob`, `path` must be the directory and `value` must be a filename glob only, so `src/*` is invalid and should be authored as `path: src`
e. If verify fails (runtime returns non-zero):
e0. If the runtime output says `blocked: human review required: ...`
→ PAUSE immediately. Do not start later steps or finalize the run.
→ Show the review prompt to the human and ask: "Continue? (yes/no/show-details)"
→ If the human says yes/approve/continue:
Run: `hotl-rt gate N approved --mode human`
Then continue to the next step
→ If the human says no/reject:
Run: `hotl-rt gate N rejected --mode human`
STOP and surface the report path
→ If the human asks for details:
Show the relevant test/report context, then wait again
→ Never treat the chat reply alone as persisted approval; the approval is only real after the `hotl-rt gate ...` call succeeds
f. If loop: false
→ STOP, report to human
→ Run: `hotl-rt step N block --reason "verify failed"` if not already marked failed by verify
→ Show last verify output. Wait for human guidance.
g. If loop: until [condition]
→ if iterations < max_iterations:
Run: `hotl-rt step N retry` then `hotl-rt step N start`
log "↻ Retrying ([n]/[max])...", retry the action
→ if iterations = max_iterations: STOP
Report: "Step N reached max iterations ([max]). [condition] not met."
Show last verify output. Wait for human guidance.
h. On step completion (verify passed):
- The runtime has already persisted the done status
- Update the workflow checkbox to [x]
- Only after the runtime confirms success should chat show "✓ Step N"
i. If gate: human
→ if auto_approve: true AND risk_level != high:
Run: `hotl-rt gate N approved --mode auto`
log "⚡ Auto-approved: Step N gate (risk: [risk_level])"
continue
→ else:
PAUSE. Show summary of what was done in this step.
Ask: "Gate reached at Step N. Continue? (yes/no/show-details)"
Wait for human response.
Run: `hotl-rt gate N approved --mode human` or `hotl-rt gate N rejected --mode human`
j. If gate: auto
→ Run: `hotl-rt gate N approved --mode auto`
→ always continue, log "⚡ Auto-approved: Step N gate"
6. All steps complete:
→ Run review checkpoint (see Review Checkpoints below)
→ Invoke hotl:verification-before-completion skill
→ For Codex final summaries, run: `scripts/finalize-codex-summary.sh`
→ For Claude Code/Cline, run: `hotl-rt finalize --json`, write the payload to a temp file, then render it with: `scripts/render-execution-summary.sh --platform <claude|cline> <summary-json-file>`
→ Do not freehand the final summary when the renderer is available
→ The rendered summary must be shown as visible chat output in the final response; do not paraphrase it away
All state persistence is handled by the hotl-rt shared runtime (runtime/hotl-rt). Agents do not manage state files directly.
The runtime owns:
.hotl/state/<run-id>.json — authoritative machine state (created by hotl-rt init, updated by hotl-rt step/gate/finalize).hotl/reports/<run-id>.md — durable Markdown report (initialized at init, updated incrementally, finalized at finalize)Run ID format: <slug>-<YYYYMMDDTHHMMSSZ> (e.g., add-auth-20260320T212315Z).
Workflow checkboxes (- [x]) are a human-visible mirror updated by the agent on step completion. The sidecar is the source of truth.
Operational rule: hotl-rt calls happen before the corresponding chat log or Codex native plan/progress update. Native progress UI is never a substitute for the runtime-managed artifacts.
See skills/resuming/SKILL.md for the full sidecar schema, stale run detection, and verify-first resume flow.
To find hotl-rt and HOTL scripts (document-lint.sh, render-execution-summary.sh, etc.), resolve in this order:
bash <plugin-path>/runtime/hotl-rt init <workflow-file>~/.codex/hotl/runtime/hotl-rt and ~/.codex/hotl/scripts/~/.codex/plugins/hotl-source/runtime/hotl-rt and ~/.codex/plugins/hotl-source/scripts/~/.codex/plugins/cache/codex-plugins/hotl/*/runtime/hotl-rt and ~/.codex/plugins/cache/codex-plugins/hotl/*/scripts/~/Documents/Cline/Scripts/hotl-rt and ~/Documents/Cline/Scripts/./runtime/hotl-rt and ./scripts/The same resolution applies to all HOTL scripts under scripts/.
Dependency: jq. hotl-rt requires jq for JSON state management. If hotl-rt fails with a jq not found error:
.hotl/state/), durable reports (.hotl/reports/), or deterministic summary renderingjqExecution report output must conform to docs/contracts/execution-report-output.md. The contract defines the durable report format (metadata, summary table, event log), execution status vocabulary, final summary semantics, platform rendering tables, and the deterministic renderer reference.
The hotl-rt runtime writes the durable report to .hotl/reports/<run-id>.md incrementally. The report survives app rendering quirks and provides a reliable post-run artifact for debugging, trust, and resume.
Reference report_path in user-facing pause, blocked, resume, and completion responses so the durable report is always discoverable.
When a verify: human-review step pauses, the response must include the report_path and make it clear that the run is paused pending approval, not failed.
If the workflow sets report_detail: full, successful verify output must also be included in the durable report, not only failures.
Record git rev-parse HEAD as the review base at run start and after each review.
After all steps have passed verification, before hotl-rt finalize:
requesting-code-review with review type: final
receiving-code-review
At intermediate gate: human steps, request review only when:
When triggered, scope the review to steps completed since the last review checkpoint, not the entire run. Use review type: checkpoint.
Do not request review at every gate by default.
Review happens:
verification-before-completionhotl-rt finalize / any "done" claimrisk_level: high in frontmatter always forces human approval at gate: human steps, even if auto_approve: truegate: human on steps with security-sensitive keywords (auth, encrypt, secret, key, password, token, permission, role, billing)Report format, status vocabulary, final summary semantics, and platform rendering tables for final artifacts are defined in docs/contracts/execution-report-output.md. This section covers runtime behavior that is executor-owned: live step visibility, progress updates, and verbose mode.
Every execution run MUST end with a visible final summary in chat. A prose recap alone is not compliant.
For Codex final summaries:
scripts/finalize-codex-summary.sh when availableIf the Codex helper is unavailable, fall back to hotl-rt finalize --json plus scripts/render-execution-summary.sh --platform codex ..., then emit that renderer output directly.
Final artifacts must follow docs/contracts/execution-report-output.md:
| Platform | Final summary rendering |
|---|---|
| Codex | Compact list in chat. Wide markdown tables are not acceptable here. |
| Claude Code | Markdown table in chat. |
| Cline | Markdown table in chat. |
Status vocabulary for final summaries includes ✓ Done, ⚡ Auto-approved, ✓ Approved, ✗ Failed, and ✗ Blocked.
Iterations means attempt count only. For tables, the Iterations column is a number only or - for gates. Never put test counts in Iterations; test counts belong in Status.
In the Codex compact list, keep the step name first and inline status detail after -. Include the status word on every line and always include iteration count details such as Done (1 attempt), Done (3 attempts), or Approved (1 attempt).
| Platform | Live step visibility |
|---|---|
| Codex | Native progress card (primary). Per-step chat logs as fallback. |
| Claude Code | Per-step one-line chat logs |
| Cline | Per-step one-line chat logs |
Every execution run MUST provide live step visibility — the user must see which step is currently executing and which are done. This is not optional on any platform.
When running in the Codex app, the executor MUST use the native plan/progress UI as the primary live step visibility surface:
in_progress at a timescripts/show-codex-current-step.sh to print the current step number, name, status, and attempts from the active run without mutating stateOn platforms without native progress (Claude Code, Cline), the executor MUST use per-step chat logs for live visibility.
After each step, log one line:
✓ Step 1: Write failing tests
✓ Step 2: Implement auth logic (3 attempts)
⚡ Step 3: Security review gate (auto-approved)
✓ Step 4: Update docs
When verbose mode is enabled, print a compact step list at each step transition (before starting a step, after a step completes/fails/auto-approves):
✓ Step 1: Write failing tests
✓ Step 2: Implement feature
→ Step 3: Run full test suite (attempt 1/3)
· Step 4: Update docs
· Step 5: Human review
Symbols:
✓ — completed→ — current step (include attempt info if looping)· — pending⚡ — auto-approved gate✗ — blocked/failedInclude short result details only when useful (test counts on completed steps, attempt progress on current step, failure reason on blocked steps).
progress: verboseIf no hotl-workflow*.md found in project root:
"No workflow file found. Would you like to:
$hotl:writing-plans in Codex, /hotl:write-plan in Claude Code)workflows/feature.md, workflows/bugfix.md, workflows/refactor.md)"