Validate whether AI agent followed wrangler's workflows and skill guidelines during session - comprehensive compliance audit of skill invocation, workflow steps, TDD, verification, and subagent usage
Validates AI agent compliance with wrangler workflows and skill guidelines during session.
/plugin marketplace add samjhecht/wrangler/plugin install wrangler@samjhecht-pluginsYou are conducting a compliance audit of the AI agent's adherence to wrangler workflows and skill guidelines during this session.
Purpose: Trust but verify - validate that wrangler's systematic processes were actually followed.
This is NOT gap analysis (that's /wrangler:analyze-session-gaps). This is compliance auditing: Did we follow the processes we said we would?
User suspicion (optional): "{user's specific concern or focus area}"
If user provided suspicion, focus analysis there while still covering all areas.
Analyze last 30-50 messages to understand what work was done:
Identify Tasks Performed:
Extract Key Events:
Build Timeline:
Message #N: User requested feature X
Message #N+2: Agent modified file.ts
Message #N+5: Agent claimed "tests pass"
Message #N+7: Agent committed changes
Note User Feedback:
Reference: docs/skill-invocation-patterns.md
For each task identified, determine:
Based on task pattern, which skills SHOULD have been invoked?
Use skill-invocation-patterns.md mappings:
check-constitutional-alignment, test-driven-development, requesting-code-review, verification-before-completionsystematic-debugging, test-driven-development, requesting-code-review, verification-before-completiontest-driven-development (ALWAYS), requesting-code-review (ALWAYS except 3 exceptions)dispatching-parallel-agents (if 3+ independent)writing-plansSearch conversation for skill announcements:
š§ Using Skill: [skill-name] | [purpose]
List all skills announced and when.
Compare expected vs actual:
### Task: [task name]
**Expected Skills**:
- test-driven-development
- requesting-code-review
- verification-before-completion
**Actually Used**:
- test-driven-development ā
- verification-before-completion ā
**Missing**:
- requesting-code-review ā
**Violation**: Code review skipped
**Severity**: HIGH
**Evidence**: Message #N shows code committed without review
Reference: docs/workflows.md
For skills that WERE invoked, verify they followed documented workflow steps:
Required steps (from workflows.md):
Compliance checks:
### TDD Compliance for [function/feature]
**Step 1: RED Phase**
- [ ] Test written first? [YES/NO/UNCLEAR]
- Evidence: [message number where test written]
- [ ] Test shows what should happen? [YES/NO]
**Step 2: Verify RED**
- [ ] Test execution output shown? [YES/NO]
- Evidence: [message number with output]
- [ ] Test failed correctly? [YES/NO]
- Failure reason: "[actual failure message]"
- Expected reason: "[what should fail]"
- Match: [YES/NO]
- [ ] Output was COMPLETE? (not paraphrased) [YES/NO]
**Step 3: GREEN Phase**
- [ ] Minimal implementation written? [YES/NO]
- [ ] Implemented AFTER test failed? [YES/NO]
**Step 4: Verify GREEN**
- [ ] Test execution output shown? [YES/NO]
- Evidence: [message number]
- [ ] All tests passed? (0 failures) [YES/NO]
- [ ] Exit code: 0? [YES/NO]
- [ ] Output was COMPLETE? [YES/NO]
**Step 5: REFACTOR**
- [ ] Refactoring performed? [YES/NO/N/A]
- [ ] Tests stayed green? [YES/NO/N/A]
**TDD Compliance Certification**
- [ ] Provided? [YES/NO]
- [ ] Complete? (all functions listed) [YES/NO]
- [ ] All "Watched fail" = YES? [YES/NO]
- Exceptions noted: [if any]
**Overall TDD Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
**Violations**: [list specific issues]
Reference: verification-before-completion skill and verification-requirements.md
Required before ANY completion claim:
### Verification Compliance for [claim]
**Claim Made**: "[exact quote from message #N]"
**Gate Function Compliance**:
0. TDD COMPLIANCE
- [ ] TDD followed? [YES/NO]
- [ ] Certification provided? [YES/NO]
1. IDENTIFY
- [ ] Identified verification command? [YES/NO]
- Command: [what should prove the claim]
2. RUN
- [ ] Command executed FRESH? [YES/NO]
- [ ] Complete command shown? [YES/NO]
3. READ
- [ ] Full output provided? [YES/NO]
- [ ] Exit code checked? [YES/NO]
- [ ] Failure count verified? [YES/NO]
4. CAPTURE
- [ ] Output copied to message? [YES/NO]
- [ ] Output was COMPLETE? (not truncated) [YES/NO]
5. VERIFY
- [ ] Output confirms claim? [YES/NO]
- Claim: "[what was claimed]"
- Reality: "[what output shows]"
- Match: [YES/NO]
6. REQUIREMENTS
- [ ] All requirements verified? [YES/NO]
- Checklist provided? [YES/NO]
7. TDD CERTIFIED
- [ ] Certification provided? [YES/NO]
8. CODE REVIEW
- [ ] Review obtained? [YES/NO/EXCEPTION]
- If exception: [which exception 1/2/3?]
- If exception: [evidence provided?]
**Violations Found**:
- [List each step violated with evidence]
**Severity**: [CRITICAL/HIGH/MEDIUM/LOW]
**Overall Verification Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
Reference: requesting-code-review skill
Required for ALL code changes (except 3 exceptions):
### Code Review Compliance
**Code Changes Made**:
- [List files modified]
**Exception Check**:
Is this one of the 3 valid exceptions?
1. Pure documentation (*.md in docs/, ZERO code)? [YES/NO]
2. Configuration-only (dependencies, NO logic)? [YES/NO]
3. Emergency hotfix (production down)? [YES/NO]
**If NO to all**: Code review is MANDATORY
**Review Status**:
- [ ] Code review requested? [YES/NO]
- Evidence: [skill announcement or Task() call]
- [ ] Review completed? [YES/NO]
- Evidence: [review report in messages]
- [ ] Critical issues: [count]
- MUST be 0 to proceed
- [ ] Important issues: [count]
- MUST be 0 OR converted to tracked issues
- [ ] Review status: [APPROVED/APPROVED_WITH_ITEMS/CHANGES_REQUIRED]
**Violations**:
- [List if review skipped without valid exception]
- [List if proceeded with unfixed Critical/Important issues]
**Severity**: [HIGH if skipped, MEDIUM if incomplete]
**Overall Code Review Compliance**: [COMPLIANT/VIOLATED]
Reference: dispatching-parallel-agents skill, writing-plans skill
Check if subagents should have been used:
### Subagent Usage Compliance
**Scenario**: [describe situation]
**Subagent Decision Tree**:
For multiple failures:
- 3+ failures present? [YES/NO]
- Failures independent? [YES/NO]
- Parallel-safe? [YES/NO]
ā Expected: [dispatching-parallel-agents / systematic-debugging]
For complex implementation:
- Plan created? [YES/NO]
- 10+ tasks? [YES/NO]
ā Expected: [/wrangler:implement with subagents]
For code review:
- Code changes made? [YES/NO]
- Review required? [YES/NO]
ā Expected: [code-review subagent dispatch]
**Actual Subagent Usage**:
- [List any Task() calls or subagent dispatches]
**Compliance**:
- [ ] Subagents used when required? [YES/NO]
- [ ] Subagent results verified? [YES/NO]
- Did agent verify subagent claims independently?
- Did agent check VCS diff?
- Did agent run tests after subagent?
**Violations**:
- [List if subagents should have been used but weren't]
- [List if subagent results weren't verified]
**Overall Subagent Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
Reference: verification-requirements.md
For each verification claim, check evidence quality:
For "tests pass" claims:
### Test Evidence for Message #N
**Claim**: "[exact quote]"
**Required Evidence** (from verification-requirements.md):
- [ ] Exact command shown? [YES/NO]
- [ ] Complete output? [YES/NO]
- Test count shown? [YES/NO]
- Failure count shown? [YES/NO]
- Duration shown? [YES/NO]
- Exit code shown? [YES/NO]
- [ ] Coverage report? [YES/NO/N/A]
**Actual Evidence Provided**:
[Quote the evidence or note "NONE"]
**Evidence Quality**: [COMPLETE/PARTIAL/MISSING]
**Violation**: [YES/NO]
For TDD claims:
### TDD Evidence for [function]
**RED Phase Evidence**:
- [ ] Test failure output shown? [YES/NO]
- [ ] Failure reason specific? [YES/NO]
- [ ] Failure matches expectation? [YES/NO]
**GREEN Phase Evidence**:
- [ ] Test pass output shown? [YES/NO]
- [ ] All tests pass? [YES/NO]
- [ ] Exit code 0? [YES/NO]
**Certification Evidence**:
- [ ] TDD Compliance Certification provided? [YES/NO]
- [ ] All functions listed? [YES/NO]
- [ ] All "Watched fail" = YES? [YES/NO]
**Evidence Quality**: [COMPLETE/PARTIAL/MISSING]
For UI work:
### Frontend Evidence for [component]
**Required** (from verification-requirements.md):
- [ ] Screenshot provided? [YES/NO]
- [ ] Responsive breakpoints tested? [YES/NO]
- [ ] Console verification (0 errors)? [YES/NO]
- [ ] Network verification? [YES/NO]
- [ ] Accessibility verification (axe-core)? [YES/NO]
- [ ] Keyboard navigation tested? [YES/NO]
- [ ] UI states tested (loading, error, empty)? [YES/NO]
**Actual Evidence**:
[List what was provided]
**Evidence Quality**: [COMPLETE/PARTIAL/MISSING]
**Violations**: [List missing items]
Identify systemic issues:
### Systemic Patterns
**TDD Patterns**:
- Tests written first: [X/Y times]
- Tests written after: [X/Y times]
- RED verification shown: [X/Y times]
- GREEN verification shown: [X/Y times]
- Pattern: [CONSISTENT TDD / INCONSISTENT / VIOLATED]
**Verification Patterns**:
- Claims with evidence: [X/Y times]
- Claims without evidence: [X/Y times]
- Complete output shown: [X/Y times]
- Paraphrased results: [X/Y times]
- Pattern: [COMPLIANT / INCONSISTENT / VIOLATED]
**Code Review Patterns**:
- Reviews requested: [X/Y times]
- Reviews skipped: [X/Y times]
- Valid exceptions: [X/Y times]
- Pattern: [COMPLIANT / INCONSISTENT / VIOLATED]
**Skill Announcement Patterns**:
- Skills properly announced: [X/Y times]
- Skills used without announcement: [X/Y times]
- Pattern: [COMPLIANT / INCONSISTENT]
# SESSION ADHERENCE VALIDATION REPORT
Generated: [timestamp]
Session scope: Messages [N]-[M] (last 30-50 messages)
## Executive Summary
**Overall Compliance**: [PASS/PARTIAL/FAIL]
- **PASS**: All required workflows followed, evidence complete
- **PARTIAL**: Most workflows followed, some violations found
- **FAIL**: Major workflows violated, evidence missing
**Compliance Scores**:
- TDD Compliance: [X/Y tasks] ([percentage]%)
- Verification Compliance: [X/Y claims] ([percentage]%)
- Code Review Compliance: [X/Y changes] ([percentage]%)
- Subagent Compliance: [X/Y opportunities] ([percentage]%)
- Evidence Quality: [X/Y items] ([percentage]%)
**Critical Violations**: [count]
**High Violations**: [count]
**Medium Violations**: [count]
**Low Violations**: [count]
**Risk Level**: [CRITICAL/HIGH/MEDIUM/LOW]
- CRITICAL: Multiple critical violations, code unverified
- HIGH: Code review skipped or major TDD violations
- MEDIUM: Evidence incomplete but work likely correct
- LOW: Minor compliance issues, work appears solid
---
## Conversation Analysis
**Messages Analyzed**: #[N] to #[M] ([count] messages)
**Tasks Identified**:
1. [Task type]: [description] (Messages #[range])
2. [Task type]: [description] (Messages #[range])
...
**Key Events Timeline**:
- Message #N: [event]
- Message #N+5: [event]
- Message #N+10: [event]
...
**Files Modified**:
- [file path] - [what changed]
- [file path] - [what changed]
...
---
## Skill Invocation Compliance
### Task 1: [task name] (Messages #[range])
**Expected Skills** (from skill-invocation-patterns.md):
- [skill-name]: [when/why required]
- [skill-name]: [when/why required]
**Actually Invoked**:
- ā [skill-name] (Message #N: š§ Using Skill: ...)
- ā [skill-name] (MISSING - not announced)
**Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
**Violations**:
- [SEVERITY] [skill-name] not invoked when required
- **Evidence**: [message references]
- **Impact**: [what could go wrong]
- **Should have**: [what should have happened]
---
### Task 2: [task name]
... (repeat for each task)
---
## Workflow Step Compliance
### TDD Workflow (test-driven-development skill)
**Functions/Features Implemented**: [count]
#### Function: [function_name] (Messages #[range])
**RED Phase**:
- Test written first? [ā/ā]
- Test failure shown? [ā/ā]
- Failure reason correct? [ā/ā]
- **Evidence**: [message #N or "MISSING"]
**GREEN Phase**:
- Minimal implementation? [ā/ā]
- Test pass shown? [ā/ā]
- All tests passed? [ā/ā]
- **Evidence**: [message #N or "MISSING"]
**Certification**:
- Provided? [ā/ā]
- Complete? [ā/ā]
- **Evidence**: [message #N or "MISSING"]
**Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
**Violations**:
- [List specific issues with severity]
---
### Verification Workflow (verification-before-completion skill)
**Completion Claims**: [count]
#### Claim: "[exact quote]" (Message #N)
**Gate Function Steps**:
0. TDD COMPLIANCE: [ā/ā]
1. IDENTIFY: [ā/ā]
2. RUN: [ā/ā]
3. READ: [ā/ā]
4. CAPTURE: [ā/ā]
5. VERIFY: [ā/ā]
6. REQUIREMENTS: [ā/ā]
7. TDD CERTIFIED: [ā/ā]
8. CODE REVIEW: [ā/ā]
**Compliance**: [COMPLIANT/PARTIAL/VIOLATED]
**Violations**:
- [SEVERITY] [Step name] not followed
- **Evidence**: [what's missing]
- **Impact**: [cannot trust claim]
- **Should have**: [what evidence needed]
---
### Code Review Workflow (requesting-code-review skill)
**Code Changes**: [count]
#### Change: [files modified] (Messages #[range])
**Exception Check**:
- Pure docs? [YES/NO]
- Config-only? [YES/NO]
- Emergency? [YES/NO]
ā Review required: [YES/NO]
**Review Status**:
- Requested? [ā/ā] (Evidence: [message #N])
- Completed? [ā/ā] (Evidence: [message #N])
- Critical issues: [count] (must be 0)
- Important issues: [count] (must be 0 or tracked)
- Status: [APPROVED/etc.]
**Compliance**: [COMPLIANT/VIOLATED]
**Violations**:
- [SEVERITY] Code review [skipped/incomplete]
- **Evidence**: [what's missing]
- **Impact**: [unreviewed code in production]
---
### Subagent Usage (dispatching-parallel-agents, writing-plans skills)
**Subagent Opportunities**: [count]
#### Opportunity: [scenario] (Messages #[range])
**Analysis**:
- Should subagents be used? [YES/NO]
- Reason: [explanation]
**Actual Usage**:
- Subagents dispatched? [ā/ā]
- Results verified? [ā/ā]
**Compliance**: [COMPLIANT/VIOLATED]
**Violations**:
- [SEVERITY] Subagents [not used/results not verified]
- **Evidence**: [what happened]
- **Impact**: [missed parallelization / unverified work]
---
## Evidence Compliance
### Test Evidence Quality
**Test Claims**: [count]
**Evidence Quality Analysis**:
| Claim (Message) | Command | Output | Coverage | Quality | Violations |
|-----------------|---------|--------|----------|---------|------------|
| "tests pass" (#N) | ā | ā | ā | PARTIAL | Missing coverage |
| "all green" (#M) | ā | ā | ā | MISSING | No evidence |
**Overall Test Evidence**: [COMPLETE/PARTIAL/MISSING]
---
### TDD Evidence Quality
**TDD Claims**: [count]
**Evidence Analysis**:
| Function | RED Output | GREEN Output | Certification | Quality | Violations |
|----------|------------|--------------|---------------|---------|------------|
| func1 | ā | ā | ā | COMPLETE | None |
| func2 | ā | ā | ā | MISSING | All evidence missing |
**Overall TDD Evidence**: [COMPLETE/PARTIAL/MISSING]
---
### Frontend Evidence Quality (if applicable)
**UI Changes**: [count]
**Evidence Analysis**:
| Component | Screenshot | Console | Network | A11y | Quality | Violations |
|-----------|------------|---------|---------|------|---------|------------|
| Button | ā | ā | N/A | ā | PARTIAL | Missing a11y |
| Form | ā | ā | ā | ā | MISSING | All missing |
**Overall Frontend Evidence**: [COMPLETE/PARTIAL/MISSING]
---
## Systemic Patterns
**TDD Pattern Analysis**:
- Tests written first: [X/Y] ([percentage]%)
- RED verification shown: [X/Y] ([percentage]%)
- GREEN verification shown: [X/Y] ([percentage]%)
- Certification provided: [X/Y] ([percentage]%)
**Pattern**: [CONSISTENT TDD / INCONSISTENT / VIOLATED]
**Verification Pattern Analysis**:
- Claims with complete evidence: [X/Y] ([percentage]%)
- Claims with partial evidence: [X/Y] ([percentage]%)
- Claims without evidence: [X/Y] ([percentage]%)
**Pattern**: [COMPLIANT / INCONSISTENT / VIOLATED]
**Code Review Pattern Analysis**:
- Reviews requested: [X/Y] ([percentage]%)
- Valid exceptions: [X/Y] ([percentage]%)
- Skipped without exception: [X/Y] ([percentage]%)
**Pattern**: [COMPLIANT / INCONSISTENT / VIOLATED]
**Skill Announcement Pattern**:
- Skills announced: [X/Y] ([percentage]%)
- Skills used silently: [X/Y] ([percentage]%)
**Pattern**: [COMPLIANT / INCONSISTENT]
---
## Violations Summary
### Critical Violations (MUST fix immediately)
1. **[Violation category]**: [description]
- **Evidence**: Message #N - [what happened]
- **Impact**: [serious consequences]
- **Fix**: [what to do now]
- **Prevent**: [how to avoid in future]
... (list all critical)
---
### High Violations (MUST fix before claiming complete)
1. **[Violation category]**: [description]
- **Evidence**: Message #N - [what happened]
- **Impact**: [significant consequences]
- **Fix**: [what to do]
- **Prevent**: [how to avoid]
... (list all high)
---
### Medium Violations (Should fix)
1. **[Violation category]**: [description]
- **Evidence**: Message #N
- **Impact**: [moderate consequences]
- **Fix**: [recommended action]
... (list all medium)
---
### Low Violations (Nice to fix)
1. **[Violation category]**: [description]
- **Evidence**: Message #N
- **Impact**: [minor consequences]
- **Fix**: [optional improvement]
... (list all low)
---
## Recommendations
### Immediate Actions (Required)
1. **[Action]**: [what to do right now]
- Addresses: [which violations]
- Steps: [specific steps]
- Verification: [how to verify done correctly]
... (list all immediate actions)
---
### Process Improvements (Recommended)
1. **[Improvement]**: [what to change going forward]
- Problem: [what pattern caused issues]
- Solution: [how to fix the pattern]
- Benefit: [why this helps]
... (list all improvements)
---
### Skills to Review
Based on violations, review these skills:
1. **[skill-name]**: `skills/[path]/SKILL.md`
- **Why**: [what was misunderstood or violated]
- **Focus on**: [specific sections]
... (list all skills needing review)
---
## Overall Assessment
**Compliance Level**: [PASS/PARTIAL/FAIL]
**Trust Level**: [HIGH/MEDIUM/LOW]
- HIGH: Work is verified, processes followed, can trust claims
- MEDIUM: Most work verified, some gaps, verify critical parts
- LOW: Significant gaps, must verify all claims independently
**Risk Assessment**:
**Code Quality Risk**: [LOW/MEDIUM/HIGH/CRITICAL]
- Risk that code has bugs: [assessment]
- Risk that tests are inadequate: [assessment]
- Risk that requirements not met: [assessment]
**Process Risk**: [LOW/MEDIUM/HIGH/CRITICAL]
- Risk of future violations: [assessment]
- Risk of systemic issues: [assessment]
**Summary**:
[2-3 sentences summarizing overall session compliance, main issues found, and overall trustworthiness of the work]
---
## Next Steps
**If PASS**: Work appears compliant, can proceed with confidence.
**If PARTIAL**: Review violations, address Critical/High items before proceeding.
**If FAIL**: Significant compliance issues found. Recommend:
1. Review all violations listed above
2. Fix Critical violations immediately
3. Re-verify all completion claims
4. Consider re-implementing with strict TDD adherence
5. Request fresh code review
6. Run this validation again after fixes
**Questions for User**:
1. Do you want me to fix the violations found?
2. Should I re-verify specific claims that lacked evidence?
3. Are there specific areas you want deeper analysis?
Critical:
High:
Medium:
Low:
If user provided suspicion:
Common suspicions:
Trust nothing without evidence:
Paraphrasing = No Evidence:
/wrangler:analyze-session-gaps asks:
/wrangler:validate-session-adherence asks:
This command is about ADHERENCE, not GAPS.
Your validation is successful when:
Remember: The user is asking "Did my agent actually follow the rules?" Be thorough, be honest, and provide clear evidence for every claim you make.