From project-toolkit
Verifies documentation accuracy against code via multi-phase workflow: assessment, compilability checks, behavioral verification, consistency, and structure audits. Use for pre-release doc audits or code-doc sync.
npx claudepluginhub rjmurillo/ai-agents --plugin project-toolkitThis skill uses the workspace's default tool permissions.
Verify documentation claims against actual code behavior. Code is truth; docs are the subject under test.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Verify documentation claims against actual code behavior. Code is truth; docs are the subject under test.
| Trigger Phrase | Operation |
|---|---|
check documentation accuracy | Full audit (Phases 1-6) |
verify code examples | Compilability check (Phases 1-3) |
audit docs vs code | Behavioral verification (Phases 1-4) |
check doc consistency | Cross-document consistency (Phases 1-2, 5) |
run doc-accuracy | Full audit (Phases 1-6) |
Use this skill when:
Use direct code review instead when:
| Skill | Reason |
|---|---|
incoherence | 15.8% recall on critical issues; Haiku agents too shallow |
doc-coverage | 0% recall on actionable issues; checks presence, not correctness |
doc-sync | No scripts, purely manual LLM workflow |
comment-analyzer | Advisory only, single-file scope |
Code compiles and runs. Documentation describes what code does. When they disagree, the code is right. This skill reads code first, builds a verified model, then checks documentation claims against that model.
Phase 1: Assessment (script-only, <30s) -> assessment.json
Phase 2: Claim Extraction (script-only, <15s) -> claims.json
Phase 3: Compilability (script-only, <60s) -> compilability-findings.json
Phase 4: Behavioral (Sonnet agents, 3-7m) -> behavioral-findings.json
Phase 5: Cross-Document (script + Sonnet, 1-2m) -> consistency-findings.json
Phase 6: Structure (Sonnet agent, 1-2m) -> structure-findings.json
| Script | Purpose |
|---|---|
scripts/doc_accuracy.py | Phases 1-3: Assessment, claim extraction, compilability check |
# Full deterministic scan (Phases 1-3)
python3 scripts/doc_accuracy.py --target /path/to/repo
# Compilability only
python3 scripts/doc_accuracy.py --target /path/to/repo --phases 3
# Incremental (changed files only)
python3 scripts/doc_accuracy.py --target /path/to/repo --diff-base main
# JSON output to specific directory
python3 scripts/doc_accuracy.py --target /path/to/repo --output-dir .doc-accuracy
# Set severity threshold for exit code
python3 scripts/doc_accuracy.py --target /path/to/repo --severity-threshold critical
# Markdown report output
python3 scripts/doc_accuracy.py --target /path/to/repo --format markdown
# Text summary to stdout
python3 scripts/doc_accuracy.py --target /path/to/repo --format summary
| File | Description |
|---|---|
assessment.json | Phase 1: doc/source inventory with symbol index |
claims.json | Phase 2: verifiable claims extracted from docs |
compilability-findings.json | Phase 3: symbol resolution findings |
gate-result.json | Gate verdict with severity counts |
report.md | Markdown summary (when --format markdown) |
Run doc_accuracy.py to produce JSON artifacts. No LLM calls.
Dispatch one Sonnet agent per file group. Each agent receives:
claims.jsonAgent prompt:
You are verifying documentation accuracy. Code is the source of truth.
SOURCE FILES (read these first):
[full content of mapped source files]
DOCUMENTATION FILE:
[full content of the documentation file]
CLAIMS TO VERIFY:
[filtered claims from claims.json]
For each claim:
1. Find the relevant code in the source files
2. Determine if the documentation claim accurately describes the code behavior
3. If inaccurate: severity, description, evidence (with line numbers), suggested fix
4. If accurate: mark as PASS
Additionally check for:
- Public API members in source absent from documentation
- Behavioral nuances the documentation omits or misrepresents
- Default values that differ between docs and code
Launch agents in parallel (one per file group).
From claims.json, filter to quantitative and behavioral claims. Group by topic. For groups with conflicting values across files, dispatch a Sonnet agent with benchmark data to determine the correct value.
Validate documentation structure (indexes, navigation, completeness). Apply comment quality framework (accuracy, completeness, long-term value, misleading elements, improvements) to a 20% sample of source comments.
Present proposed fixes for user approval before modifying any file. Categories:
| Class | Description | Detection Phase |
|---|---|---|
| 1: Spec vs Behavior | Docs say X, code does Y | Phase 4 |
| 2: Non-Compilable Code | Code examples reference nonexistent symbols | Phase 3 |
| 3: Cross-Doc Inconsistency | Same fact, different values across files | Phase 5 |
| 4: Domain Violations | Technology convention violations (OTel, Prometheus) | Phase 4 + Plugins |
| 5: API Surface Gaps | Public API exists but is undocumented | Phase 3 + Phase 4 |
| Level | Definition |
|---|---|
| Critical | Code will not compile, or behavior is silently wrong |
| High | Materially misleading but no immediate failure |
| Medium | Inconsistent or confusing but correct in at least one location |
| Low | Cosmetic, improvement opportunity, or minor omission |
| Code | Meaning |
|---|---|
| 0 | No findings at or above severity threshold |
| 1 | Error (file not found, parse error) |
| 10 | Findings at or above severity threshold |
| Avoid | Why | Instead |
|---|---|---|
| Trusting documentation as peer of code | Docs and code are not equal; code is truth | Always read implementation before checking docs |
| Using Haiku for behavioral verification | 15.8% recall vs 100% with Sonnet | Use Sonnet agents for Phase 4 |
| One agent per dimension | Loses cross-cutting context | One agent per file group |
| Skipping Phase 1 for Phase 4 | Agents need symbol index for precise verification | Always run Phases 1-2 first |
| Running all phases on unchanged files | Wastes time and tokens | Use --diff-base for incremental checks |
After running:
| Skill | Relationship |
|---|---|
incoherence | Replaced: Detection logic superseded by Phases 3-5 |
doc-coverage | Replaced: Symbol extraction logic preserved in Phase 1 |
doc-sync | Replaced: Structural audit absorbed into Phase 6 |
analyze | Complementary: broader codebase analysis |
style-enforcement | Complementary: code style checks |