From ck
Teaches Cavekit methodology for AI agent software development: Hunt lifecycle, Specify Before Building principle, scientific method analogy, decision matrix, and routes to sub-skills. Use for structuring projects with AI agents.
npx claudepluginhub juliusbrussee/cavekitThis skill uses the workspace's default tool permissions.
**Always define what you want before telling agents how to build it. Go through a cavekit stage — never jump straight from raw requirements to implementation.**
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Guides agent creation for Claude Code plugins with file templates, frontmatter specs (name, description, model), triggering examples, system prompts, and best practices.
Always define what you want before telling agents how to build it. Go through a cavekit stage — never jump straight from raw requirements to implementation.
Cavekit is a methodology for building software with AI coding agents that puts kits at the center of the development process — code is derived from them, not the other way around. Whether starting from scratch or modernizing an existing system, the principle is the same:
In both cases, the kits become a living contract that agents consume to continuously build, validate, and refine the application.
| Property | Benefit |
|---|---|
| Structured | Organized as a navigable tree, enabling agents to load only what they need |
| Human-legible | Engineers can audit requirements at a higher level than code |
| Stack-independent | Decoupled from any single framework or language |
| Independently evolvable | Kits can be refined without touching implementation |
| Verifiable | Every requirement includes acceptance criteria agents can check |
Key Insight: Well-written kits with strong validation make your application reproducible — any agent can rebuild it from the kits alone. Think of it as continuous regeneration.
LLMs are inherently non-deterministic — like running an experiment, each individual call may yield different results. But through the right methodology — clear hypotheses, controlled conditions, and repeated trials — we extract reliable, reproducible outcomes from a stochastic process.
Cavekit applies the scientific method to software construction — hypothesize, test, observe, refine.
| Layer | Analogy | What It Does |
|---|---|---|
| LLM calls | Individual experiments | Each run may produce different results; no single output is authoritative |
| Kits | Hypotheses | Define what you expect to observe — the predicted behavior |
| Validation gates | Controlled conditions | Ensure reproducibility by constraining what counts as a valid outcome |
| Convergence loops | Repeated trials | Build statistical confidence through successive passes |
| Implementation tracking | Lab notebook | Record what was tried, what worked, and what failed |
| Revision | Revising the hypothesis | When results contradict expectations, update the theory upstream |
The outcome: a disciplined, repeatable engineering process layered on top of probabilistic generation.
The Hunt is the four-phase lifecycle: Sketch, Map, Make, Check. Each phase has dedicated prompts that drive it.
| Phase | Input | Output | AI Role | Human Role |
|---|---|---|---|---|
| Draft | Source materials, domain knowledge, existing systems | Implementation-agnostic kits | Extract requirements, structure knowledge | Verify kits capture intent accurately |
| Architect | Kits + framework research | Framework-specific implementation plans | Design architecture, break down work, order steps | Approve architectural choices |
| Build | Plans + kits | Working code + tests + tracking docs | Write code, run tests, check against kits | Watch for drift and blockers |
| Inspect | Failed validations, gaps, manual fixes | Updated kits/plans via revision | Identify root causes, propagate fixes upstream | Evaluate outcomes, set priorities |
| Monitor | Running application, git history | Issues, anomalies, progress reports | Scan for regressions, surface metrics | Interpret reports, guide next steps |
Each phase has gate conditions that must be met before moving to the next:
The Inspect phase is where the human serves as reviewer and decision-maker, not hands-on coder. You monitor the process, request changes as needed, and make systemic improvements to kits and prompts.
For the full Hunt phase reference, see
references/hunt-phases.md.
Use when the project has significant scope, evolving requirements, or needs autonomous agent execution.
| Indicator | Threshold |
|---|---|
| Codebase size | 50+ source files |
| Requirements | Evolving, multi-domain |
| Agent coordination | Multi-agent or multi-prompt pipelines |
| Environment | Production, security-sensitive, brownfield |
| Team structure | Multi-team or cross-team |
| Execution mode | Long-running autonomous work (overnight, unattended) |
What you get: Full Hunt lifecycle, context directory with kits/plans/impl tracking, prompt pipeline, convergence loops, revision, validation gates.
Use when scope is moderate — too complex for ad-hoc but not worth a full pipeline.
| Indicator | Threshold |
|---|---|
| Codebase size | 5-50 files |
| Requirements | Mostly clear, focused |
| Agent coordination | Single agent, possibly with sub-agents |
| Execution mode | Interactive with occasional iteration loops |
What you do:
context/kits/cavekit-task.md capturing requirementscontext/plans/plan-task.md sequencing the implementationThis is the "Cavekit floor" — most of the benefit without the overhead of a full multi-phase pipeline.
Use when the task is trivially small.
| Indicator | Threshold |
|---|---|
| Codebase size | Less than 5 files |
| Task type | One-off tools, simple bug fixes, exploratory prototypes |
| Implementation | Fits comfortably in one agent session without needing external references |
Heuristic: If the whole task fits in one context window with room to spare, full Cavekit adds more overhead than value.
Start with lightweight Cavekit even if the project is small. If the scope expands, you already have the structure in place to scale up. It is much harder to retrofit kits onto a large codebase than to grow a cavekit directory from the beginning.
Cavekit mirrors a build pipeline — each stage transforms input into validated output, with feedback loops that propagate corrections upstream:
Traditional CI/CD:
Code → Build → Test → Deploy
Cavekit AI Pipeline:
Cavekit Change
→ Generate Plans (iteration loop)
→ Generate Implementation (iteration loop)
→ Validate (Tests + Review)
→ Human Audit (Monitor & Steer)
→ [Gap Found]
→ Revise
→ Cavekit Change (cycle repeats)
Every stage can run as an iteration loop — the same prompt executed repeatedly until output stabilizes. The iteration loop is what transforms nondeterministic LLM output into predictable, validated software.
The iteration loop is the fundamental execution unit in Cavekit. Execute the same prompt against the same codebase multiple times until the delta between runs approaches zero.
Mechanics:
Convergence signal: A shrinking volume of modifications across successive passes — the diff gets smaller each time until only cosmetic changes remain. You are looking for diminishing returns, not absolute zero.
When the loop isn't stabilizing, the problem is upstream — fix the inputs (specs, validation, coordination), not the iteration count.
If the diff is not shrinking between runs:
Cavekit is composed of techniques that work together. This methodology skill is the index — each sub-skill below is self-contained but cross-references others.
| Skill | Purpose | When to Use |
|---|---|---|
ck:cavekit-writing | Write implementation-agnostic kits with testable acceptance criteria | Draft phase — always the first step |
ck:context-architecture | Organize context for progressive disclosure | Project setup and ongoing maintenance |
ck:impl-tracking | Track implementation progress, dead ends, test health | Build and Inspect phases |
ck:validation-first | Design validation gates agents can execute | All phases — validation is continuous |
| Skill | Purpose | When to Use |
|---|---|---|
ck:prompt-pipeline | Design numbered prompt pipelines for the Hunt | Setting up automation |
ck:revision | Trace bugs back to kits and fix at the source | Inspect phase — after finding gaps |
cavekit:brownfield-adoption | Adopt Cavekit on existing codebases | Starting Cavekit on legacy projects |
| Skill | Purpose | When to Use |
|---|---|---|
ck:peer-review | Use a second agent to challenge the first | Quality gates, architecture review |
cavekit:speculative-pipeline | Stagger pipeline stages for parallelism | Optimizing long pipelines |
ck:convergence-monitoring | Detect convergence vs ceiling | Monitoring iteration loops |
cavekit:documentation-inversion | Turn documentation into agent-consumable skills | Library/module documentation |
Cavekit works with existing skills, not as a replacement:
| Existing Skill | Cavekit Integration |
|---|---|
superpowers:brainstorming | Use during cavekit generation to explore requirements |
superpowers:writing-plans | Use during plan generation for structured planning |
superpowers:test-driven-development | TDD-within-Cavekit: cavekit acceptance criteria become failing tests |
superpowers:verification-before-completion | Use for gate validation in every phase |
superpowers:executing-plans | Use during implementation phase |
superpowers:dispatching-parallel-agents | Use for agent team coordination |
Set up context directory:
context/
├── refs/ # Source materials (PRDs, language specs, research)
├── kits/ # Implementation-agnostic kits
├── plans/ # Framework-specific implementation plans
├── impl/ # Living implementation tracking
└── prompts/ # Hunt pipeline prompts
Write kits from your reference materials (see ck:cavekit-writing)
Generate plans from kits (see ck:prompt-pipeline)
Implement with validation gates (see ck:validation-first)
Track progress in implementation documents (see ck:impl-tracking)
Iterate — when gaps are found, revise kits (see ck:revision)
cavekit:brownfield-adoption)Cavekit is not a tool — it is a methodology. The core loop is simple:
Agents become more capable the more precisely you constrain them — clear kits, automated validation, and structured iteration loops let them operate with increasing autonomy. None of this eliminates the need for software engineers. Your judgment on architecture, your ability to write precise kits, and your instinct for what "done" looks like are the inputs that make the whole system function. Cavekit is a force multiplier: one engineer's clarity of thought, scaled across an entire implementation pipeline.