Verifies epistemic quality of documents before RAG ingestion, generating Clarity-Gated Documents (CGD) and validating Sources of Truth (SOT) to prevent LLM hallucinations.
From antigravity-awesome-skillsnpx claudepluginhub sickn33/antigravity-awesome-skills --plugin antigravity-awesome-skillsThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Purpose: Pre-ingestion verification system that enforces epistemic quality before documents enter RAG knowledge bases. Produces Clarity-Gated Documents (CGD) compliant with the Clarity Gate Format Specification v2.1.
Core Question: "If another LLM reads this document, will it mistake assumptions for facts?"
Core Principle: "Detection finds what is; enforcement ensures what should be. In practice: find the missing uncertainty markers before they become confident hallucinations."
| Feature | Description |
|---|---|
| Claim Completion Status | PENDING/VERIFIED determined by field presence (no explicit status field) |
| Source Field Semantics | Actionable source (PENDING) vs. what-was-found (VERIFIED) |
| Claim ID Format Guidance | Hash-based IDs preferred, collision analysis for scale |
| Body Structure Requirements | HITL Verification Record section mandatory when claims exist |
| New Validation Codes | E-ST10, W-ST11, W-HC01, W-HC02, E-SC06 (FORMAT_SPEC); E-TB01-07 (SOT validation) |
| Bundled Scripts | claim_id.py and document_hash.py for deterministic computations |
This skill implements and references:
| Specification | Version | Location |
|---|---|---|
| Clarity Gate Format (Unified) | v2.1 | docs/CLARITY_GATE_FORMAT_SPEC.md |
Note: v2.0 unifies CGD and SOT into a single .cgd.md format. SOT is now a CGD with an optional tier: block.
Clarity Gate defines validation codes for structural and semantic checks per FORMAT_SPEC v2.1:
| Code | Check | Severity |
|---|---|---|
| W-HC01 | Partial confirmed-by/confirmed-date fields | WARNING |
| W-HC02 | Vague source (e.g., "industry reports", "TBD") | WARNING |
| E-SC06 | Schema error in hitl-claims structure | ERROR |
| Code | Check | Severity |
|---|---|---|
| E-ST10 | Missing ## HITL Verification Record when claims exist | ERROR |
| W-ST11 | Table rows don't match hitl-claims count | WARNING |
| Code | Check | Severity |
|---|---|---|
| E-TB01 | No ## Verified Claims section | ERROR |
| E-TB02 | Table has no data rows | ERROR |
| E-TB03 | Required columns missing | ERROR |
| E-TB04 | Column order wrong | ERROR |
| E-TB05 | Empty cell in required column | ERROR |
| E-TB06 | Invalid date format in Verified column | ERROR |
| E-TB07 | Verified date in future (beyond 24h grace) | ERROR |
Note: Additional validation codes may be defined in RFC-001 (clarification document) but are not part of the normative FORMAT_SPEC.
This skill includes Python scripts for deterministic computations per FORMAT_SPEC.
Computes stable, hash-based claim IDs for HITL tracking (per §1.3.4).
# Generate claim ID
python scripts/claim_id.py "Base price is $99/mo" "api-pricing/1"
# Output: claim-75fb137a
# Run test vectors
python scripts/claim_id.py --test
Algorithm:
Test vectors:
claim_id("Base price is $99/mo", "api-pricing/1") → claim-75fb137aclaim_id("The API supports GraphQL", "features/1") → claim-eb357742Computes document SHA-256 hash per FORMAT_SPEC §2.2-2.4 with full canonicalization.
# Compute hash
python scripts/document_hash.py my-doc.cgd.md
# Output: 7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
# Verify existing hash
python scripts/document_hash.py --verify my-doc.cgd.md
# Output: PASS: Hash verified: 7d865e...
# Run normalization tests
python scripts/document_hash.py --test
Algorithm (per §2.2-2.4):
---\n and <!-- CLARITY_GATE_END -->document-sha256 line from YAML frontmatter ONLY (with multiline continuation support)Cross-platform normalization:
Existing tools like UnScientify and HedgeHunter (CoNLL-2010) detect uncertainty markers already present in text ("Is uncertainty expressed?").
Clarity Gate enforces their presence where epistemically required ("Should uncertainty be expressed but isn't?").
| Tool Type | Question | Example |
|---|---|---|
| Detection | "Does this text contain hedges?" | UnScientify/HedgeHunter find "may", "possibly" |
| Enforcement | "Should this claim be hedged but isn't?" | Clarity Gate flags "Revenue will be $50M" |
Clarity Gate verifies FORM, not TRUTH.
This skill checks whether claims are properly marked as uncertain—it cannot verify if claims are actually true.
Risk: An LLM can hallucinate facts INTO a document, then "pass" Clarity Gate by adding source markers to false claims.
Solution: HITL (Human-In-The-Loop) verification is MANDATORY before declaring PASS.
The 9 Verification Points guide semantic review — content quality checks that require judgment (human or AI). They answer questions like "Should this claim be hedged?" and "Are these numbers consistent?"
When review completes, output a CGD file conforming to CLARITY_GATE_FORMAT_SPEC.md. The C/S rules in CLARITY_GATE_FORMAT_SPEC.md validate file structure, not semantic content.
The connection:
clarity-status, hitl-status, hitl-pending-count)Example: If Point 5 (Data Consistency) finds conflicting numbers, you'd mark clarity-status: UNCLEAR until resolved. Rule C7 then ensures you can't claim REVIEWED while still UNCLEAR.
1. HYPOTHESIS vs FACT LABELING Every claim must be clearly marked as validated or hypothetical.
| Fails | Passes |
|---|---|
| "Our architecture outperforms competitors" | "Our architecture outperforms competitors [benchmark data in Table 3]" |
| "The model achieves 40% improvement" | "The model achieves 40% improvement [measured on dataset X]" |
Fix: Add markers: "PROJECTED:", "HYPOTHESIS:", "UNTESTED:", "(estimated)", "~", "?"
2. UNCERTAINTY MARKER ENFORCEMENT Forward-looking statements require qualifiers.
| Fails | Passes |
|---|---|
| "Revenue will be $50M by Q4" | "Revenue is projected to be $50M by Q4" |
| "The feature will reduce churn" | "The feature is expected to reduce churn" |
Fix: Add "projected", "estimated", "expected", "designed to", "intended to"
3. ASSUMPTION VISIBILITY Implicit assumptions that affect interpretation must be explicit.
| Fails | Passes |
|---|---|
| "The system scales linearly" | "The system scales linearly [assuming <1000 concurrent users]" |
| "Response time is 50ms" | "Response time is 50ms [under standard load conditions]" |
Fix: Add bracketed conditions: "[assuming X]", "[under conditions Y]", "[when Z]"
4. AUTHORITATIVE-LOOKING UNVALIDATED DATA Tables with specific percentages and checkmarks look like measured data.
Red flag: Tables with specific numbers (89%, 95%, 100%) without sources
Fix: Add "(guess)", "(est.)", "?" to numbers. Add explicit warning: "PROJECTED VALUES - NOT MEASURED"
5. DATA CONSISTENCY Scan for conflicting numbers, dates, or facts within the document.
Red flag: "500 users" in one section, "750 users" in another
Fix: Reconcile conflicts or explicitly note the discrepancy with explanation.
6. IMPLICIT CAUSATION Claims that imply causation without evidence.
Red flag: "Shorter prompts improve response quality" (plausible but unproven)
Fix: Reframe as hypothesis: "Shorter prompts MAY improve response quality (hypothesis, not validated)"
7. FUTURE STATE AS PRESENT Describing planned/hoped outcomes as if already achieved.
Red flag: "The system processes 10,000 requests per second" (when it hasn't been built)
Fix: Use future/conditional: "The system is DESIGNED TO process..." or "TARGET: 10,000 rps"
8. TEMPORAL COHERENCE Document dates and timestamps must be internally consistent and plausible.
| Fails | Passes |
|---|---|
| "Last Updated: December 2024" (when current is 2026) | "Last Updated: January 2026" |
| v1.0.0 dated 2024-12-23, v1.1.0 dated 2024-12-20 | Versions in chronological order |
Sub-checks:
Fix: Update dates, add "as of [date]" qualifiers, flag stale claims
9. EXTERNALLY VERIFIABLE CLAIMS Specific numbers that could be fact-checked should be flagged for verification.
| Type | Example | Risk |
|---|---|---|
| Pricing | "Costs ~$0.005 per call" | API pricing changes |
| Statistics | "Papers average 15-30 equations" | May be wildly off |
| Rates/ratios | "40% of researchers use X" | Needs citation |
| Competitor claims | "No competitor offers Y" | May be outdated |
Fix options:
Claim Extracted --> Does Source of Truth Exist?
|
+---------------+---------------+
YES NO
| |
Tier 1: Automated Tier 2: HITL
Consistency & Verification Two-Round Verification
| |
PASS / BLOCK Round A → Round B → APPROVE / REJECT
A. Internal Consistency
B. External Verification (Extension Interface)
Round A: Derived Data Confirmation
Round B: True HITL Verification
When producing a Clarity-Gated Document, use this format per CLARITY_GATE_FORMAT_SPEC.md v2.1:
---
clarity-gate-version: 2.1
processed-date: 2026-01-12
processed-by: Claude + Human Review
clarity-status: CLEAR
hitl-status: REVIEWED
hitl-pending-count: 0
points-passed: 1-9
rag-ingestable: true # computed by validator - do not set manually
document-sha256: 7d865e959b2466918c9863afca942d0fb89d7c9ac0c99bafc3749504ded97730
hitl-claims:
- id: claim-75fb137a
text: "Revenue projection is $50M"
value: "$50M"
source: "Q3 planning doc"
location: "revenue-projections/1"
round: B
confirmed-by: Francesco
confirmed-date: 2026-01-12
---
# Document Title
[Document body with epistemic markers applied]
Claims like "Revenue will be $50M" become "Revenue is **projected** to be $50M *(unverified projection)*"
---
## HITL Verification Record
### Round A: Derived Data Confirmation
- Claim 1 (source) ✓
- Claim 2 (source) ✓
### Round B: True HITL Verification
| # | Claim | Status | Verified By | Date |
|---|-------|--------|-------------|------|
| 1 | [claim] | ✓ Confirmed | [name] | [date] |
<!-- CLARITY_GATE_END -->
Clarity Gate: CLEAR | REVIEWED
Required CGD Elements (per spec):
clarity-gate-version — Tool version (no "v" prefix)processed-date — YYYY-MM-DD formatprocessed-by — Processor nameclarity-status — CLEAR or UNCLEARhitl-status — PENDING, REVIEWED, or REVIEWED_WITH_EXCEPTIONShitl-pending-count — Integer ≥ 0points-passed — e.g., 1-9 or 1-4,7,9hitl-claims — List of verified claims (may be empty [])<!-- CLARITY_GATE_END -->
Clarity Gate: <clarity-status> | <hitl-status>
Optional/Computed Fields:
rag-ingestable — Computed by validators, not manually set. Shows true only when CLEAR | REVIEWED with no exclusion blocks.document-sha256 — Required. 64-char lowercase hex hash for integrity verification. See spec §2 for computation rules.exclusions-coverage — Optional. Fraction of body inside exclusion blocks (0.0–1.0).Escape Mechanism: To write about markers like *(estimated)* without triggering parsing, wrap in backticks: `*(estimated)*`
Claim verification status is determined by field presence, not an explicit status field:
| State | confirmed-by | confirmed-date | Meaning |
|---|---|---|---|
| PENDING | absent | absent | Awaiting human verification |
| VERIFIED | present | present | Human has confirmed |
| (invalid) | present | absent | W-HC01: partial fields |
| (invalid) | absent | present | W-HC01: partial fields |
Why no explicit status field? Field presence is self-enforcing—you can't accidentally set status without providing who/when.
The source field meaning changes based on claim state:
| State | source Contains | Example |
|---|---|---|
| PENDING | Where to verify (actionable) | "Check Q3 planning doc" |
| VERIFIED | What was found (evidence) | "Q3 planning doc, page 12" |
Vague source detection (W-HC02): Sources like "industry reports", "research", "TBD" trigger warnings.
General pattern: claim-[a-z0-9._-]{1,64} (alphanumeric, dots, underscores, hyphens)
| Approach | Pattern | Example | Use Case |
|---|---|---|---|
| Hash-based (preferred) | claim-[a-f0-9]{8,} | claim-75fb137a | Deterministic, collision-resistant |
| Sequential | claim-[0-9]+ | claim-1, claim-2 | Simple documents |
| Semantic | claim-[a-z0-9-]+ | claim-revenue-q3 | Human-friendly |
Collision probability: At 1,000 claims with 8-char hex IDs: ~0.012%. For >1,000 claims, use 12+ hex characters.
Recommendation: Use hash-based IDs generated by scripts/claim_id.py for consistency and collision resistance.
When content cannot be resolved (no SME available, legacy prose, etc.), mark it as excluded rather than leaving it ambiguous:
<!-- CG-EXCLUSION:BEGIN id=auth-legacy-1 -->
Legacy authentication details that require SME review...
<!-- CG-EXCLUSION:END id=auth-legacy-1 -->
Rules:
[A-Za-z0-9][A-Za-z0-9._-]{0,63}hitl-status: REVIEWED_WITH_EXCEPTIONSexceptions-reason and exceptions-ids in frontmatterImportant: Documents with exclusion blocks are not RAG-ingestable. They're rejected entirely (no partial ingestion).
See CLARITY_GATE_FORMAT_SPEC.md §4 for complete rules.
When validating a Source of Truth file, the skill checks both format compliance (per CLARITY_GATE_FORMAT_SPEC.md) and content quality (the 9 points).
SOT documents are CGDs with a tier: block. They require a ## Verified Claims section with a valid table.
| Code | Check | Severity |
|---|---|---|
| E-TB01 | No ## Verified Claims section | ERROR |
| E-TB02 | Table has no data rows | ERROR |
| E-TB03 | Required columns missing (Claim, Value, Source, Verified) | ERROR |
| E-TB04 | Column order wrong (Claim not first or Verified not last) | ERROR |
| E-TB05 | Empty cell in required column | ERROR |
| E-TB06 | Invalid date format in Verified column | ERROR |
| E-TB07 | Verified date in future (beyond 24h grace) | ERROR |
The 9 Verification Points apply to SOT content:
| Point | SOT Application |
|---|---|
| 1-4 | Check claims in ## Verified Claims are actually verified |
| 5 | Check for conflicting values across tables |
| 6 | Check claims don't imply unsupported causation |
| 7 | Check table doesn't state futures as present |
| 8 | Check dates are chronologically consistent |
| 9 | Flag specific numbers for external check |
tier: block containing level, owner, version, promoted-date, promoted-by## Verified Claims section with columns: Claim, Value, Source, Verified[STABLE], [CHECK], [VOLATILE], [SNAPSHOT] in content
[STABLE] — Safe to cite without rechecking[CHECK] — Verify before citing[VOLATILE] — Changes frequently; always verify[SNAPSHOT] — Point-in-time data; include date when citingAfter running Clarity Gate, report:
## Clarity Gate Results
**Document:** [filename]
**Issues Found:** [number]
### Critical (will cause hallucination)
- [issue + location + fix]
### Warning (could cause equivocation)
- [issue + location + fix]
### Temporal (date/time issues)
- [issue + location + fix]
### Externally Verifiable Claims
| # | Claim | Type | Suggested Verification |
|---|-------|------|------------------------|
| 1 | [claim] | Pricing | [where to verify] |
---
## Round A: Derived Data Confirmation
- [claim] ([source])
Reply "confirmed" or flag any I misread.
---
## Round B: HITL Verification Required
| # | Claim | Why HITL Needed | Human Confirms |
|---|-------|-----------------|----------------|
| 1 | [claim] | [reason] | [ ] True / [ ] False |
---
**Would you like me to produce an annotated CGD version?**
---
**Verdict:** PENDING CONFIRMATION
| Level | Definition | Action |
|---|---|---|
| CRITICAL | LLM will likely treat hypothesis as fact | Must fix before use |
| WARNING | LLM might misinterpret | Should fix |
| TEMPORAL | Date/time inconsistency detected | Verify and update |
| VERIFIABLE | Specific claim that could be fact-checked | Route to HITL or external search |
| ROUND A | Derived from witnessed source | Quick confirmation |
| ROUND B | Requires true verification | Cannot pass without confirmation |
| PASS | Clearly marked, no ambiguity, verified | No action needed |
| Pattern | Action |
|---|---|
| Specific percentages (89%, 73%) | Add source or mark as estimate |
| Comparison tables | Add "PROJECTED" header |
| "Achieves", "delivers", "provides" | Use "designed to", "intended to" if not validated |
| Checkmarks | Verify these are confirmed |
| "100%" anything | Almost always needs qualification |
| "Last Updated: [date]" | Check against current date |
| Version numbers with dates | Verify chronological order |
| "$X.XX" or "~$X" (pricing) | Flag for external verification |
| "averages", "typically" | Flag for source/citation |
| Competitor capability claims | Flag for external verification |
| Project | Purpose | URL |
|---|---|---|
| Source of Truth Creator | Create epistemically calibrated docs | github.com/frmoretto/source-of-truth-creator |
| Stream Coding | Documentation-first methodology | github.com/frmoretto/stream-coding |
| ArXiParse | Scientific paper verification | arxiparse.org |
document_hash.py now implements full FORMAT_SPEC §2.1-2.4 compliancecanonicalize() function: trailing whitespace stripping, newline collapsing, NFC normalizationdocument-sha256 removal with multiline continuation support (§2.2)claim_id.py, document_hash.py<!-- CLARITY_GATE_END --> markershitl-claims format to v2.0 schema (id, text, value, source, location, round).cgd.md extension)Version: 2.1.3 Spec Version: 2.1 Author: Francesco Marinoni Moretto License: CC-BY-4.0