From cappy-toolkit
Phase gate thresholds and recovery logic for all CAPPY investigation phases — defines confidence, completeness, coherence, and quality thresholds and specifies what to do when a gate fails.
npx claudepluginhub thelightarchitect/cappy-toolkit --plugin cappy-toolkitThis skill uses the workspace's default tool permissions.
<!-- Copyright (C) 2025-2026 Kevin Francis Tan (github.com/theLightArchitect) | SPDX-License-Identifier: AGPL-3.0-or-later -->
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Version: 1.0.0 Purpose: Define quality gates for phase boundaries Agent: CAPPY (singleton agent) Created: 2026-02-05
How to check a gate: Extract the score field from the phase result JSON, compare against the threshold in the table below, return PASS or FAIL. No Rust hooks or Python methods required.
Purpose: Ensure triage identified sufficient patterns with high confidence
Specification:
phase: 2
name: "confidence_gate"
operation: ">="
threshold: 0.70
extract_field: "overall_confidence"
pass_reason: "Confidence meets or exceeds 70% threshold"
fail_reason: "Confidence below 70% threshold"
fail_details: "Triage confidence insufficient. Need more/better evidence or different pattern."
next_action: "Ready for Phase 3 evidence extraction."
recovery_options:
- priority: 1
action: "Request HAR file from customer"
impact: "HAR analysis typically increases confidence 10-15%"
effort: "Customer communication + re-analyze"
- priority: 2
action: "Get manual guidance from Kevin"
impact: "Human review may identify patterns Claude missed"
effort: "Escalation"
- priority: 3
action: "Explore different pattern"
impact: "May identify alternative root cause with higher confidence"
effort: "Restart triage with different hypothesis"
confidence_ranges:
0.0 - 0.3: "Very low - restart with different approach"
0.3 - 0.5: "Low - request additional evidence"
0.5 - 0.7: "Medium - borderline, needs recovery"
0.7 - 0.85: "High - passes gate"
0.85 - 1.0: "Very high - strong pattern identified"
Purpose: Ensure evidence extraction is comprehensive enough to support hypothesis
Specification:
phase: 3
name: "completeness_gate"
operation: ">="
threshold: 0.80
extract_field: "evidence_sufficiency"
pass_reason: "Evidence sufficiency meets or exceeds 80% threshold"
fail_reason: "Evidence sufficiency below 80% threshold"
fail_details: "Evidence extraction incomplete. Key evidence missing or unverified."
next_action: "Ready for Phase 4 hypothesis design."
recovery_options:
- priority: 1
action: "Request specific missing evidence"
impact: "Targeted evidence request fills gaps"
effort: "Identify gaps, customer communication"
- priority: 2
action: "Deep-dive additional log analysis"
impact: "Manual parsing may find evidence Claude missed"
effort: "Additional forensics work"
- priority: 3
action: "Accept partial evidence, flag assumptions"
impact: "Proceed with caveats, mark unverified"
effort: "Mark Phase 5 focus for verification"
sufficiency_ranges:
0.0 - 0.4: "Very incomplete - major evidence gaps"
0.4 - 0.6: "Incomplete - significant gaps"
0.6 - 0.8: "Borderline - some gaps remain"
0.8 - 0.9: "Sufficient - passes gate"
0.9 - 1.0: "Complete - comprehensive evidence"
Purpose: Ensure hypothesis aligns with extracted evidence and has no contradictions
Specification:
phase: 4
name: "hypothesis_coherence_gate"
operation: ">="
threshold: 0.85
extract_field: "hypothesis_coherence_score"
pass_reason: "Hypothesis alignment meets or exceeds 85% threshold"
fail_reason: "Hypothesis alignment below 85% threshold"
fail_details: "Hypothesis doesn't align with evidence or has contradictions."
next_action: "Ready for Phase 5 validation research."
recovery_options:
- priority: 1
action: "Revise hypothesis based on evidence gaps"
impact: "Adjust hypothesis to match actual evidence"
effort: "Re-analyze Phase 4 work"
- priority: 2
action: "Identify and resolve contradictions"
impact: "Remove conflicting assumptions"
effort: "Deep evidence review"
- priority: 3
action: "Request additional evidence for verification"
impact: "New evidence may reconcile gaps"
effort: "Customer communication"
coherence_ranges:
0.0 - 0.5: "Poor - major gaps, restart hypothesis"
0.5 - 0.7: "Weak - significant misalignment"
0.7 - 0.85: "Borderline - needs revision"
0.85 - 0.95: "Strong - passes gate"
0.95 - 1.0: "Excellent - highly coherent"
contradiction_types:
- "Assumption contradicts Phase 3 evidence"
- "Hypothesis claims unverified in evidence"
- "Timeline doesn't match evidence events"
- "Architecture doesn't match customer setup"
Purpose: Ensure solution is validated against multiple sources and customer-specific
Specification:
phase: 5
name: "solution_validation_gate"
operation: ">="
threshold: 0.85
extract_field: "solution_quality_score"
pass_reason: "Solution quality meets or exceeds 85% threshold"
fail_reason: "Solution quality below 85% threshold"
fail_details: "Solution not sufficiently validated or has gaps."
next_action: "Ready for Phase 6 solution design and Phase 7 deliverables."
recovery_options:
- priority: 1
action: "Validate against additional sources"
impact: "Multi-source validation increases confidence"
effort: "Additional research"
- priority: 2
action: "Test solution with customer data"
impact: "Proof-of-concept validation"
effort: "Hands-on testing"
- priority: 3
action: "Get peer review from TAC"
impact: "Expert validation"
effort: "Escalation"
quality_ranges:
0.0 - 0.5: "Poor solution - restart"
0.5 - 0.7: "Weak - needs more validation"
0.7 - 0.85: "Borderline - close to passing"
0.85 - 0.95: "Strong - passes gate"
0.95 - 1.0: "Excellent - highly validated"
validation_sources:
- "Customer configuration files"
- "JIRA tickets for this product/version"
- "Cortex official documentation"
- "TAC playbooks"
- "Knowledge base articles"
- "Community forums/discussions"
guardian_skill = load_skill("/gate")gate_spec = guardian_skill.get_gate_for_phase(phase)gate_value = extract_from_phase_output(phase_output)if gate_value >= gate_spec.threshold: PASS else: BLOCK{status: PASSED/BLOCKED, gate_value, threshold, options}Gate enforcement is working when:
Skill Version: 1.0.0 Last Updated: 2026-02-05 Status: Ready for CAPPY (singleton agent) integration
Purpose: Ensure all deliverables meet verification standard before customer delivery - P-007 final quality gate
Specification:
phase: 7
name: "verification_checkpoint_gate"
operation: ">="
threshold: 0.90
extract_field: "verification_rate"
pass_reason: "Verification rate meets or exceeds 90% threshold - ready for customer delivery"
fail_reason: "Verification rate below 90% threshold - some claims unverified"
fail_details: "Not all claims have been verified against Phase 3 evidence. Recovery options available: additional research, delivery with caveats, or Kevin override."
next_action: "Approved for Phase 7 deliverable generation and customer delivery."
recovery_options:
- priority: 1
action: "Return to Phase 5 for additional verification research"
impact: "Research and verify remaining unverified claims"
effort: "Additional research time (varies by claim complexity)"
expected_improvement: "10-30% increase in verification rate"
- priority: 2
action: "Deliver with verification caveats"
impact: "Mark unverified claims explicitly in deliverables as 'unverified assumption'"
effort: "Minimal - update deliverable templates"
expected_improvement: "N/A - same verification rate, but documented"
- priority: 3
action: "Get Kevin approval to override gate"
impact: "Proceed to delivery regardless of verification rate"
effort: "Escalation"
expected_improvement: "N/A - bypasses gate"
verification_ranges:
0.0 - 0.70: "Poor - major verification gaps, recommend Phase 5 return"
0.70 - 0.85: "Fair - significant unverified claims, recommend Phase 5 return"
0.85 - 0.90: "Good - borderline, close to threshold, consider Phase 5 or caveats"
0.90 - 0.95: "Very good - passes gate, delivery ready"
0.95 - 1.0: "Excellent - all or nearly all claims verified"
verification_sources:
- "Phase 3 evidence extraction (primary - file:line citations)"
- "Phase 5 research validation (secondary - documentation/specs)"
- "Contradiction detection (if no contradictions, strengthens verification)"
verification_metrics:
claims_total: "Total claims extracted during investigation"
claims_verified: "Claims with Phase 3 citations to evidence files"
claims_unverified: "Claims without citations or unresolved contradictions"
verification_rate: "claims_verified / claims_total"
unverified_list: "Explicit list of unverified claims for recovery planning"
contradictions: "Any conflicting claims that require resolution"
gate_blocking_conditions:
- "verification_rate < 0.90 AND claims_unverified > 0"
- "unresolved contradictions exist"
- "manual verification found discrepancies not yet fixed"
guardian_skill = load_skill("/gate")gate_spec = guardian_skill.get_gate_for_phase(7)verification_checkpoint = extract_from_inv_context("verification_checkpoint")verification_rate = verification_checkpoint.verification_rateif verification_rate >= 0.90: PASS else: BLOCKIf gate BLOCKS (verification_rate < 0.90):
BLOCKED State
│
├─ Option 1: Return to Phase 5 (Additional Research)
│ ├─ Target unverified claims from unverified_list
│ ├─ Search documentation/JIRA/KB for citations
│ ├─ Run Phase 5 validation on targeted claims
│ └─ Re-run Phase 7 gate check
│
├─ Option 2: Deliver with Caveats (Document Unverified)
│ ├─ Mark each unverified claim in deliverables
│ ├─ Label as "Based on analysis but unverified in evidence"
│ ├─ Customer knows which parts are verified vs assumption
│ └─ Still proceed to Phase 7 deliverable generation
│
└─ Option 3: Get Kevin Approval (Override Gate)
├─ Escalate to Kevin with verification_rate and unverified_list
├─ Kevin decides: deliver as-is or request changes
├─ If approved: proceed to Phase 7
└─ If rejected: return to Phase 5
Phase 7 Gate Version: 2.0.0 Added: 2026-02-06 Praetorian Alignment: ✅ Hard gate enforcement, explicit recovery paths, no ambiguity
When Phase 4 completes and handoff calls loop-decision, the MCP tool evaluates these
6 conditions in priority order. The first condition that is true exits the loop.
| Priority | Condition | inv_context field | Threshold | Exit reason |
|---|---|---|---|---|
| 1 | HITL force proceed | params.force_proceed | true | hitl_force_proceed |
| 2 | Max iterations reached | investigation.loop_iteration | >= 3 | max_iterations_reached |
| 3 | Evidence ceiling hit | investigation.evidence_ceiling | true | evidence_ceiling_hit |
| 4 | Confidence threshold met | phases.phase_4.coherence_score OR phases.phase_2.confidence_score | >= 85 | confidence_threshold_met |
| 5 | All hypotheses resolved | hypotheses.list[*].status | all VERIFIED or ELIMINATED | all_hypotheses_resolved |
| 6 | All hypotheses eliminated | hypotheses.list[*].status | all ELIMINATED (no VERIFIED) | all_hypotheses_eliminated |
If NO condition is met → loop continues. Next phase:
confidence_score < 50 → re-triage (Phase 2, SP-LOOP-2)confidence_score >= 50 → re-evidence (Phase 3, SP-LOOP-3)These conditions are implemented in loop_decision_util.rs::evaluate_loop_decision().
Field names and thresholds here are authoritative for CAPPY reasoning; the Rust code is
authoritative for actual execution.