Skill

cautilus

Use when intentful behavior evaluation itself is the task and the repo should run Cautilus's checked-in workflow instead of reconstructing compare, held-out, and review commands by hand.

npx claudepluginhub corca-ai/cautilus --plugin cautilus

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/cautilus:cautilus

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this bundled skill when intentful behavior evaluation itself is the task

Supporting Files

agents/openai.yaml

SKILL.md

219 lines · ~2k tokens

Similar Skills

algorithmic-art

147.3k

Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.

3 files

document-skills

Stats

LanguageGo

Parent stars0

MaintenanceExcellent

Last CommitApr 11, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Cautilus

Use this bundled skill when intentful behavior evaluation itself is the task and the repo wants to run the checked-in Cautilus workflow instead of rebuilding commands by hand.

Cautilus should stay usable as a standalone product:

resolve or scaffold repo-local adapters
run bounded review variants through the bundled CLI
evaluate operator-facing behavior, including CLI surfaces, with explicit intent packets
keep held-out evaluation and review prompts explicit
keep host-repo fixtures, prompts, and policy outside the product boundary

Bootstrap

Resolve the adapter from the target repo:

node ./bin/cautilus adapter resolve --repo-root .

For a named adapter:

node ./bin/cautilus adapter resolve --repo-root . --adapter-name code-quality

If the repo does not have an adapter yet, scaffold one:

node ./bin/cautilus adapter init --repo-root .

Check whether the repo is already ready for standalone Cautilus use:

node ./bin/cautilus doctor --repo-root .

Read the canonical workflow and contracts before widening the surface:

Workflow

Resolve the adapter and restate the candidate, baseline, and intended decision.
When the run needs clean git-ref A/B workspaces, prepare them with the product-owned helper:

node ./bin/cautilus workspace prepare-compare \
  --repo-root . \
  --baseline-ref origin/main \
  --output-dir /tmp/cautilus-compare

If the repo keeps one artifact root with one subdirectory per run, prune older Cautilus bundles instead of letting logs and compare workspaces grow forever:

node ./bin/cautilus workspace prune-artifacts \
  --root /tmp/cautilus-runs \
  --keep-last 20

Run adapter-defined preflight commands before long evaluations.
Use iterate mode for tuning, held-out mode for validation, and full gate for ship decisions.
When the adapter defines executor_variants, run the checked-in review runner instead of retyping ad-hoc shell commands:

node ./bin/cautilus review variants \
  --repo-root . \
  --workspace . \
  --output-dir /tmp/cautilus-review

When the target repo is Cautilus itself, prefer the checked-in explicit self-dogfood command over rebuilding the same mode/report/review chain by hand:

npm run dogfood:self

When the job is tuning the self-dogfood review budget or comparing review surfaces, use the checked-in experiment runner instead of inventing ad hoc A/B loops:

npm run dogfood:self:experiments

When the job only needs to refresh the static HTML view of the current checked-in self-dogfood bundle (for example after hand-editing the markdown narrative or regenerating JSON offline), use:

npm run dogfood:self:html

Treat dogfood:self as the canonical operator-facing record of the current self-dogfood result. Treat dogfood:self:experiments as the place for stronger claims such as binary-surface, skill-surface, and gate-honesty probes. Treat dogfood:self:html as a read-only view of the checked-in JSON bundle, not as a separate source of truth.

Report exact commands, exact placeholder values, and the final recommendation.
When the repo already has normalized scenario proposal candidates, generate a checked-in proposal packet instead of hand-drafting scenario JSON:

node ./bin/cautilus scenario normalize chatbot \
  --input ./fixtures/scenario-proposals/chatbot-input.json

node ./bin/cautilus scenario normalize cli \
  --input ./fixtures/scenario-proposals/cli-input.json

node ./bin/cautilus scenario normalize skill \
  --input ./fixtures/scenario-proposals/skill-input.json

node ./bin/cautilus scenario prepare-input \
  --candidates ./fixtures/scenario-proposals/candidates.json \
  --registry ./fixtures/scenario-proposals/registry.json \
  --coverage ./fixtures/scenario-proposals/coverage.json \
  --family fast_regression

node ./bin/cautilus scenario propose \
  --input ./fixtures/scenario-proposals/standalone-input.json

node ./bin/cautilus scenario summarize-telemetry \
  --results ./fixtures/scenario-proposals/results.json

node ./bin/cautilus report build \
  --input ./fixtures/reports/report-input.json

node ./bin/cautilus mode evaluate \
  --repo-root . \
  --mode held_out \
  --intent "CLI behavior should remain legible." \
  --baseline-ref origin/main \
  --output-dir /tmp/cautilus-mode

node ./bin/cautilus review prepare-input \
  --repo-root . \
  --report-file /tmp/cautilus-mode/report.json

node ./bin/cautilus review build-prompt-input \
  --review-packet /tmp/cautilus-mode/review.json

node ./bin/cautilus review render-prompt \
  --input /tmp/cautilus-mode/review-prompt-input.json

node ./bin/cautilus evidence prepare-input \
  --report-file /tmp/cautilus-mode/report.json \
  --scenario-results-file /tmp/cautilus-mode/scenario-results.json \
  --run-audit-file /tmp/cautilus-run-audit/summary.json \
  --history-file /tmp/cautilus-history/history.json

node ./bin/cautilus evidence bundle \
  --input /tmp/cautilus-evidence/input.json

node ./bin/cautilus optimize prepare-input \
  --report-file /tmp/cautilus-mode/report.json \
  --review-summary /tmp/cautilus-review/summary.json \
  --history-file /tmp/cautilus-history/history.json \
  --target prompt \
  --optimizer reflection \
  --budget medium

node ./bin/cautilus optimize propose \
  --input /tmp/cautilus-optimize/input.json

node ./bin/cautilus optimize build-artifact \
  --proposal-file /tmp/cautilus-optimize/proposal.json

node ./bin/cautilus review variants \
  --repo-root . \
  --workspace . \
  --report-file /tmp/cautilus-mode/report.json \
  --output-dir /tmp/cautilus-review

node ./bin/cautilus cli evaluate \
  --input ./fixtures/cli-evaluation/doctor-missing-adapter.json

Guardrails

Do not treat Ceal-local prompts, adapters, or report paths as product-owned defaults.
Do not turn review loops into open-ended retries.
Do not turn optimizer output into an open-ended retry loop.
Keep held-out evaluation held out unless the benchmark itself is being changed deliberately.
Prefer checked-in wrapper scripts and schemas over inline shell quoting.

cautilus

Invocation

Context Preview

Supporting Files

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

cautilus

Invocation

Context Preview

Supporting Files

SKILL.md

Cautilus

Bootstrap

Workflow

Guardrails

Similar Skills

Help us improve

Cautilus

Bootstrap

Workflow

Guardrails