designdoc
Harness-engineered codebase documentation pipeline.
Walks any repo bottom-up and emits a validated docs/design/ tree: per-class docs, package rollups, mermaid diagrams (syntax + semantics validated), a system-design rollup, a tech-debt ledger, and a YAML file of unresolved human-in-the-loop disputes.
Install
As a CLI
Install the designdoc command into a uv-managed tool environment:
uv tool install git+https://github.com/SpillwaveSolutions/docgen
designdoc generate --repo /path/to/your/repo --budget 5.00
(Alternatively, pipx install git+https://github.com/SpillwaveSolutions/docgen if you prefer pipx. Once published to PyPI, pip install designdoc will also work.)
As a Claude Code plugin
claude plugin marketplace add SpillwaveSolutions/docgen
claude plugin install designdoc
Adds a /designdoc slash command (generate | resume | status | resolve).
From a clone (development)
git clone https://github.com/SpillwaveSolutions/docgen
cd docgen
uv sync
uv run designdoc generate --repo /path/to/your/repo --budget 5.00
Output lands in <repo>/docs/design/.
Status
v1 complete, v1.1 incremental-regeneration landed. All nine pipeline stages are implemented, tested, and measured end-to-end. See plans/2026_04_16_designdoc_gen_v1.md for the task plan.
Measured performance
Against the tests/fixtures/tiny_repo fixture (5 Python files, 3 classes, 1 dep) on claude-sonnet-4-6 via a Claude Max subscription:
| Run | Wall clock | Cost (SDK-reported) | LLM invocations |
|---|
| Cold (first run, parallelism=3) | ~16 min | ~$3.98 | 58 |
| Cold (parallelism=1 baseline) | ~26 min | ~$4.57 | 60 |
| Warm (no source changes) | < 1 sec | $0.00 | 0 |
Two v1.1 optimizations combine here:
- Parallelism:
config.parallelism (default 3) caps concurrent doer/checker invocations in Stages 2/3/4/6 via asyncio.Semaphore. Cold-run wall clock drops ~37% on tiny_repo; bigger repos with more files will see larger gains. Tune via --parallelism N or [pipeline].parallelism in .designdoc.toml.
- Incremental: the warm run skips every stage via content-hash comparison against
prev_hashes / rollup_hashes in the pipeline state. Any single-file edit regenerates only that file's class doc + its package rollup + the system rollup — not the whole tree.
Run designdoc status to see which caches are primed. Reproduce with task test-e2e (requires claude CLI logged in and npx on PATH).
Design principles (Gen 3 harness engineering)
- Control flow lives in Python, not prompts.
- Checkers run in their own context window (no self-grading).
- Scopes are small and bounded (file → class → package → system).
- Failures are loud (schema-validated verdicts, HIL YAML on dispute).
- Reliability over speed (
max_attempts=3, bounded parallelism).
- Mermaid is syntax + semantics validated before shipping.
See CLAUDE.md / AGENT.md for the full invariants.
Development
Prerequisites
- Python 3.12+ (dev machine runs 3.13)
- uv for env management
- Task for running commands
@mermaid-js/mermaid-cli via npx (auto-fetched at Stage 5 preflight)
claude CLI (Claude Code) logged in to a Pro/Max subscription — used by
the e2e / dogfood runs. No ANTHROPIC_API_KEY required.
Commands
task install # uv sync — install deps
task test # unit + integration, no real API
task test-unit # unit tests only
task test-e2e # e2e tests (requires API key + mmdc)
task lint # ruff check
task format # ruff format
task ci # exactly what CI runs — must be green before push
task dogfood # real pipeline run against tests/fixtures/tiny_repo
Run a single test:
uv run pytest tests/unit/test_loop.py::test_ships_with_hil_after_3_fails -v
Test-and-commit discipline
Every change follows TWRC: write the test, write the code, run task ci, commit.
CI parity: task ci must run the exact same commands as .github/workflows/test.yml. If you change one, change the other in the same commit. Every commit is a green checkpoint.
Layout