<skill_overview> Review completed implementation against bd epic to catch gaps before claiming completion; spec is contract, implementation must fulfill contract completely. </skill_overview>

<rigidity_level> LOW FREEDOM - Follow the 4-step review process exactly. Review with Google Fellow-level scrutiny. Never skip automated checks, quality gates, or code reading. No approval without evidence for every criterion. </rigidity_level>

<evidence_requirements>

Evidence-Based Review (Zero Speculation Principle)

Every claim requires evidence:

Claim Type	Required Evidence
"Code implements X"	File path:line number showing implementation
"Test covers Y"	Test name + specific assertion
"Criterion met"	Command output proving criterion
"No anti-pattern"	Search command showing no matches

Confidence Scores:

Rate each finding 0.0-1.0:

1.0 - Verified with direct evidence (ran command, read code)
0.8 - Strong indirect evidence (multiple consistent signals)
0.5 - Uncertain (partial evidence, assumptions made)
0.3 - Weak (limited investigation, needs more verification)

Findings below 0.8 confidence must be investigated until ≥0.8 or marked UNCERTAIN.

Example evidence format:

| Criterion | Status | Confidence | Evidence |
|-----------|--------|------------|----------|
| All tests pass | ✅ Met | 1.0 | `cargo test`: 127 passed, 0 failed |
| No unwrap in production | ❌ Not met | 1.0 | `rg "\.unwrap\(\)" src/`: Found at jwt.ts:45 |
| Error handling proper | ⚠️ Uncertain | 0.5 | Read jwt.ts, unclear if all paths covered |

</evidence_requirements>

<quick_reference>

Step	Action	Deliverable
1	Load bd epic + all tasks	TodoWrite with tasks to review
2	Review each task (automated checks, quality gates, read code, audit tests, verify criteria)	Findings per task
3	Report findings (approved / gaps found)	Review decision
4	Gate: If approved → finishing-a-development-branch, If gaps → STOP	Next action

Review Perspective: Google Fellow-level SRE with 20+ years experience reviewing junior engineer code.

Test Quality Gate: Every new test must catch a real bug. Tautological tests (pass by definition, test mocks, verify compiler-checked facts) = GAPS FOUND. </quick_reference>

<when_to_use>

hyperpowers:executing-plans completed all tasks
Before claiming work is complete
Before hyperpowers:finishing-a-development-branch
Want to verify implementation matches spec

Don't use for:

Mid-implementation (use hyperpowers:executing-plans)
Before all tasks done
Code reviews of external PRs (this is self-review) </when_to_use>

<the_process>

Step 1: Load Epic Specification

Announce: "I'm using hyperpowers:review-implementation to verify implementation matches spec. Reviewing with Google Fellow-level scrutiny."

Get epic and tasks:

bd show bd-1          # Epic specification
bd dep tree bd-1      # Task tree
bd list --parent bd-1 # All tasks

Create TodoWrite tracker:

TodoWrite todos:
- Review bd-2: Task Name
- Review bd-3: Task Name
- Review bd-4: Task Name
- Compile findings and make decision

Step 2: Review Each Task

For each task:

A. Read Task Specification

bd show bd-3

Extract:

Goal (what problem solved?)
Success criteria (how verify done?)
Implementation checklist (files/functions/tests)
Key considerations (edge cases)
Anti-patterns (prohibited patterns)

B. Run Automated Code Completeness Checks

# TODOs/FIXMEs without issue numbers
rg -i "todo|fixme" src/ tests/ || echo "✅ None"

# Stub implementations
rg "unimplemented!|todo!|unreachable!|panic!\(\"not implemented" src/ || echo "✅ None"

# Unsafe patterns in production
rg "\.unwrap\(\)|\.expect\(" src/ | grep -v "/tests/" || echo "✅ None"

# Ignored/skipped tests
rg "#\[ignore\]|#\[skip\]|\.skip\(\)" tests/ src/ || echo "✅ None"

B2. Dead Code and Refactoring Remnants Audit

Context: After refactoring, old code must be REMOVED, not kept as fallback. The canonical implementation is the new one. Old code is dead code.

Key principle: "Don't bother with unused code. Delete it before you try to improve anything."

Automated detection patterns:

# 1. Fallback/Legacy Code Detection
# Patterns indicating old code left behind:
rg -i "fallback|legacy|old_|_old|deprecated|obsolete" src/ || echo "✅ None"

# Conditional using old implementation:
rg -i "if.*use.*old|if.*legacy|if.*fallback|ENABLE_OLD|USE_LEGACY|FALLBACK_TO" src/ || echo "✅ None"

# "was:" or "previously:" comments (describing removed behavior):
rg -i "was:|previously:|used to|before refactor" src/ || echo "✅ None"

# 2. Unused Code Detection (Language-Specific)

# Rust - dead code warnings:
cargo build 2>&1 | grep -E "warning.*never used|warning.*dead_code" || echo "✅ None"

# TypeScript/JavaScript - unused exports (if eslint configured):
npx eslint --rule 'no-unused-vars: error' src/ 2>/dev/null || echo "Check manually"

# Swift - unused variables (SwiftLint):
swiftlint lint --reporter json 2>/dev/null | jq '.[] | select(.rule_id == "unused")' || echo "Check manually"

# Python - vulture if available:
vulture src/ --min-confidence 80 2>/dev/null || echo "vulture not installed, check manually"

# 3. Orphaned Tests Detection
# Find tests that reference functions/classes that no longer exist:
git diff main...HEAD --name-only | grep -E "(test|spec)" || echo "No test files changed"

# 4. Deprecation Remnants (should be REMOVED, not marked):
rg "@deprecated|#\[deprecated\]|// deprecated|DEPRECATED|@Deprecated" src/ || echo "✅ None"

# 5. Backwards Compatibility Shims (unless external API):
rg -i "backward.*compat|legacy.*support|shim|polyfill" src/ || echo "✅ None"

If any patterns found, investigate:

Fallback code: Why does old implementation still exist? Delete it.
Unused functions: Who calls this? If nobody, delete it.
Orphaned tests: Does tested functionality still exist? If not, delete test.
Deprecation markers: Remove now or create bd issue with removal date.
Backwards compat shims: Is this external API? If internal, delete shim.

Dead Code Audit Results Template:

#### Dead Code Audit Results

| Category | Pattern | Found | Location | Action |
|----------|---------|-------|----------|--------|
| Fallback code | `legacy\|old_\|fallback` | 0 | - | ✅ None |
| Unused functions | compiler warnings | 0 | - | ✅ None |
| Deprecation markers | `@deprecated` | 0 | - | ✅ None |
| Orphaned tests | tests for removed code | 0 | - | ✅ None |
| Backwards compat shims | `shim\|polyfill` | 0 | - | ✅ None |

**Verdict:** ✅ No dead code / ❌ Dead code found - refactoring incomplete

If dead code found: This is a GAP. Old code after refactoring = incomplete refactoring.

C. Run Quality Gates (via test-runner agent)

IMPORTANT: Use hyperpowers:test-runner agent to avoid context pollution.

Dispatch hyperpowers:test-runner: "Run: cargo test"
Dispatch hyperpowers:test-runner: "Run: cargo fmt --check"
Dispatch hyperpowers:test-runner: "Run: cargo clippy -- -D warnings"
Dispatch hyperpowers:test-runner: "Run: .git/hooks/pre-commit"

D. Read Implementation Files

CRITICAL: READ actual files, not just git diff.

# See changes
git diff main...HEAD -- src/auth/jwt.ts

# THEN READ FULL FILE
Read tool: src/auth/jwt.ts

While reading, check:

✅ Code implements checklist items (not stubs)
✅ Error handling uses proper patterns (Result, try/catch)
✅ Edge cases from "Key Considerations" handled
✅ Code is clear and maintainable
✅ No anti-patterns present

E. Code Quality Review (Google Fellow Perspective)

Assume code written by junior engineer. Apply production-grade scrutiny.

Error Handling:

Proper use of Result/Option or try/catch?
Error messages helpful for production debugging?
No unwrap/expect in production?
Errors propagate with context?
Failure modes graceful?

Safety:

No unsafe blocks without justification?
Proper bounds checking?
No potential panics?
No data races?
No SQL injection, XSS vulnerabilities?

Clarity:

Would junior understand in 6 months?
Single responsibility per function?
Descriptive variable names?
Complex logic explained?
No clever tricks - obvious and boring?

Testing (CRITICAL - Apply strict scrutiny):

Edge cases covered (empty, max, Unicode)?
Tests catch real bugs, not just inflate coverage?
Test names describe specific bug prevented?
Tests test behavior, not implementation?
Failure scenarios tested?
No tautological tests (see Test Quality Audit below)?

Production Readiness:

Comfortable deploying to production?
Could cause outage or data loss?
Performance acceptable under load?
Logging sufficient for debugging?

E2. Test Quality Audit (Mandatory for All New Tests)

CRITICAL: Review every new/modified test for meaningfulness. Tautological tests are WORSE than no tests - they give false confidence.

For each test, ask:

What bug would this catch? → If you can't name a specific failure mode, test is pointless
Could production code break while this test passes? → If yes, test is too weak
Does this test a real user scenario? → Or just implementation details?
Is the assertion meaningful? → expect(result != nil) is weaker than expect(result == expectedValue)

Red flags (REJECT implementation until fixed):

❌ Tests that only verify syntax/existence ("enum has cases", "struct has fields")
❌ Tautological tests (pass by definition: expect(builder.build() != nil) when build() can't return nil)
❌ Tests that duplicate implementation (testing 1+1==2 by asserting 1+1==2)
❌ Tests without meaningful assertions (call code but don't verify outcomes matter)
❌ Tests that verify mock behavior instead of production code
❌ Codable/Equatable round-trip tests with only happy path data
❌ Generic test names ("test_basic", "test_it_works", "test_model")

Examples of meaningless tests to reject:

// ❌ REJECT: Tautological - compiler ensures enum has cases
func testEnumHasCases() {
    _ = MyEnum.caseOne  // This proves nothing
    _ = MyEnum.caseTwo
}

// ❌ REJECT: Tautological - build() returns non-optional, can't be nil
func testBuilderReturnsValue() {
    let result = Builder().build()
    #expect(result != nil)  // Always passes by type system
}

// ❌ REJECT: Tests mock, not production code
func testServiceCallsAPI() {
    let mock = MockAPI()
    let service = Service(api: mock)
    service.fetchData()
    #expect(mock.fetchCalled)  // Tests mock behavior, not real logic
}

// ❌ REJECT: Happy path only, no edge cases
func testCodable() {
    let original = User(name: "John", age: 30)
    let data = try! encoder.encode(original)
    let decoded = try! decoder.decode(User.self, from: data)
    #expect(decoded == original)  // What about empty name? Max age? Unicode?
}

Examples of meaningful tests to approve:

// ✅ APPROVE: Catches missing validation bug
func testEmptyPayloadReturnsValidationError() {
    let result = validator.validate(payload: "")
    #expect(result == .error(.emptyPayload))
}

// ✅ APPROVE: Catches race condition bug
func testConcurrentWritesDontCorruptData() {
    let store = ThreadSafeStore()
    DispatchQueue.concurrentPerform(iterations: 1000) { i in
        store.write(key: "k\(i)", value: i)
    }
    #expect(store.count == 1000)  // Would fail if race condition exists
}

// ✅ APPROVE: Catches error handling bug
func testMalformedJSONReturns400Not500() {
    let response = api.parse(json: "{invalid")
    #expect(response.status == 400)  // Not 500 which would indicate unhandled exception
}

// ✅ APPROVE: Catches encoding bug with edge case
func testUnicodeNamePreservedAfterRoundtrip() {
    let original = User(name: "日本語テスト 🎉")
    let decoded = roundtrip(original)
    #expect(decoded.name == original.name)
}

Audit process:

# Find all new/modified test files
git diff main...HEAD --name-only | grep -E "(test|spec)"

# Read each test file
Read tool: tests/new_feature_test.swift

# For EACH test function, document:
# - Test name
# - What bug it catches (or "TAUTOLOGICAL" if none)
# - Verdict: ✅ Keep / ⚠️ Strengthen / ❌ Remove/Replace

If tautological tests found:

## Test Quality Audit: GAPS FOUND ❌

### Tautological/Meaningless Tests
| Test | Problem | Action |
|------|---------|--------|
| testEnumHasCases | Compiler already ensures this | ❌ Remove |
| testBuilderReturns | Non-optional return, can't be nil | ❌ Remove |
| testCodable | Happy path only, no edge cases | ⚠️ Add: empty, unicode, max values |
| testServiceCalls | Tests mock, not production | ❌ Replace with integration test |

**Cannot approve until tests are meaningful.**

F. Verify Success Criteria with Evidence

For EACH criterion in bd task:

Run verification command
Check actual output
Don't assume - verify with evidence
Use hyperpowers:test-runner for tests/lints

Example:

Criterion: "All tests passing"
Command: cargo test
Evidence: "127 tests passed, 0 failures"
Result: ✅ Met

Criterion: "No unwrap in production"
Command: rg "\.unwrap\(\)" src/
Evidence: "No matches"
Result: ✅ Met

G. Check Anti-Patterns

Search for each prohibited pattern from bd task:

# Example anti-patterns from task
rg "\.unwrap\(\)" src/  # If task prohibits unwrap
rg "TODO" src/          # If task prohibits untracked TODOs
rg "\.skip\(\)" tests/  # If task prohibits skipped tests

H. Verify Key Considerations

Read code to confirm edge cases handled:

Empty input validation
Unicode handling
Concurrent access
Failure modes
Performance concerns

Example: Task says "Must handle empty payload" → Find validation code for empty payload.

I. Record Findings

### Task: bd-3 - Implement JWT authentication

#### Evidence-Based Findings

| Criterion | Status | Confidence | Evidence |
|-----------|--------|------------|----------|
| All tests pass | ✅ Met | 1.0 | `cargo test`: 127 passed |
| Pre-commit passes | ❌ Not met | 1.0 | `cargo clippy`: 3 warnings |
| No unwrap in production | ❌ Not met | 1.0 | `rg "\.unwrap()"`: src/auth/jwt.ts:45 |

#### File Evidence
| File | Line | What Verified | Confidence |
|------|------|---------------|------------|
| src/auth/jwt.ts | 45 | unwrap violation | 1.0 |
| src/auth/jwt.ts | 12-30 | token generation logic | 0.9 |

**Findings below 0.8:** None (all verified)

#### Automated Checks
- TODOs: ✅ None
- Stubs: ✅ None
- Unsafe patterns: ❌ Found `.unwrap()` at src/auth/jwt.ts:45
- Ignored tests: ✅ None

#### Quality Gates
- Tests: ✅ Pass (127 tests)
- Formatting: ✅ Pass
- Linting: ❌ 3 warnings
- Pre-commit: ❌ Fails due to linting

#### Files Reviewed
- src/auth/jwt.ts: ⚠️ Contains `.unwrap()` at line 45
- tests/auth/jwt_test.rs: ✅ Complete

#### Code Quality
- Error Handling: ⚠️ Uses unwrap instead of proper error propagation
- Safety: ✅ Good
- Clarity: ✅ Good
- Testing: See Test Quality Audit below

#### Test Quality Audit (New/Modified Tests)
| Test | Bug It Catches | Verdict |
|------|----------------|---------|
| test_valid_token_accepted | Missing validation | ✅ Keep |
| test_expired_token_rejected | Expiration bypass | ✅ Keep |
| test_jwt_struct_exists | Nothing (tautological) | ❌ Remove |
| test_encode_decode | Encoding bug (but happy path only) | ⚠️ Add edge cases |

**Tautological tests found:** 1 (test_jwt_struct_exists)
**Weak tests found:** 1 (test_encode_decode needs edge cases)

#### Anti-Patterns
- "NO unwrap in production": ❌ Violated at src/auth/jwt.ts:45

#### Issues
**Critical:**
1. unwrap() at jwt.ts:45 - violates anti-pattern, must use proper error handling
2. Tautological test: test_jwt_struct_exists must be removed

**Important:**
3. 3 clippy warnings block pre-commit hook
4. test_encode_decode needs edge cases (empty, unicode, max length)

J. Mark Task Reviewed (TodoWrite)

Step 3: Report Findings

After reviewing ALL tasks:

If NO gaps:

## Implementation Review: APPROVED ✅

Reviewed bd-1 (OAuth Authentication) against implementation.

### Tasks Reviewed
- bd-2: Configure OAuth provider ✅
- bd-3: Implement token exchange ✅
- bd-4: Add refresh logic ✅

### Verification Summary
- All success criteria verified
- No anti-patterns detected
- All key considerations addressed
- All files implemented per spec

### Evidence
- Tests: 127 passed, 0 failures (2.3s)
- Linting: No warnings
- Pre-commit: Pass
- Code review: Production-ready

Ready to proceed to hyperpowers:finishing-a-development-branch.

If gaps found:

## Implementation Review: GAPS FOUND ❌

Reviewed bd-1 (OAuth Authentication) against implementation.

### Tasks with Gaps

#### bd-3: Implement token exchange
**Gaps:**
- ❌ Success criterion not met: "Pre-commit hooks pass"
  - Evidence: cargo clippy shows 3 warnings
- ❌ Anti-pattern violation: Found `.unwrap()` at src/auth/jwt.ts:45
- ⚠️ Key consideration not addressed: "Empty payload validation"
  - No check for empty payload in generateToken()

#### bd-4: Add refresh logic
**Gaps:**
- ❌ Success criterion not met: "All tests passing"
  - Evidence: test_verify_expired_token failing

### Cannot Proceed
Implementation does not match spec. Fix gaps before completing.

Step 4: Gate Decision

If APPROVED:

Announce: "I'm using hyperpowers:finishing-a-development-branch to complete this work."

Use Skill tool: hyperpowers:finishing-a-development-branch

If GAPS FOUND:

STOP. Do not proceed to finishing-a-development-branch.
Fix gaps or discuss with partner.
Re-run review after fixes.

</the_process>

<examples> <example> <scenario>Developer only checks git diff, doesn't read actual files</scenario> <code> # Review process git diff main...HEAD # Shows changes

Developer sees:

function generateToken(payload) {
return jwt.sign(payload, secret);
}

Approves based on diff

"Looks good, token generation implemented ✅"

Misses: Full context shows no validation

function generateToken(payload) { // No validation of payload! // No check for empty payload (key consideration) // No error handling if jwt.sign fails return jwt.sign(payload, secret); } </code>

<why_it_fails>

Git diff shows additions, not full context
Missed that empty payload not validated (key consideration)
Missed that error handling missing (quality issue)
False approval - gaps exist but not caught
Will fail in production when empty payload passed </why_it_fails>

<correction> **Correct review process:**

# See changes
git diff main...HEAD -- src/auth/jwt.ts

# THEN READ FULL FILE
Read tool: src/auth/jwt.ts

Reading full file reveals:

function generateToken(payload) {
  // Missing: empty payload check (key consideration from bd task)
  // Missing: error handling for jwt.sign failure
  return jwt.sign(payload, secret);
}

Record in findings:

⚠️ Key consideration not addressed: "Empty payload validation"
- No check for empty payload in generateToken()
- Code at src/auth/jwt.ts:15-17

⚠️ Error handling: jwt.sign can throw, not handled

What you gain:

Caught gaps that git diff missed
Full context reveals missing validation
Quality issues identified before production
Spec compliance verified, not assumed </correction>

</example> <example> <scenario>Developer assumes tests passing means done</scenario> <code> # Run tests cargo test # Output: 127 tests passed

Developer concludes

"Tests pass, implementation complete ✅"

Proceeds to finishing-a-development-branch

Misses:

bd task has 5 success criteria
Only checked 1 (tests pass)
Anti-pattern: unwrap() present (prohibited)
Key consideration: Unicode handling not tested
Linter has warnings (blocks pre-commit) </code>

<why_it_fails>

Tests passing ≠ spec compliance
Didn't verify all success criteria
Didn't check anti-patterns
Didn't verify key considerations
Pre-commit will fail (blocks merge)
Ships code violating anti-patterns </why_it_fails>

<correction> **Correct review checks ALL criteria:**

bd task has 5 success criteria:
1. "All tests pass" ✅ - Evidence: 127 passed
2. "Pre-commit passes" ❌ - Evidence: clippy warns (3 warnings)
3. "No unwrap in production" ❌ - Evidence: Found at jwt.ts:45
4. "Unicode handling tested" ⚠️ - Need to verify test exists
5. "Rate limiting implemented" ⚠️ - Need to check code

Result: 1/5 criteria verified met. GAPS EXIST.

Run additional checks:

# Check criterion 2
cargo clippy
# 3 warnings found ❌

# Check criterion 3
rg "\.unwrap\(\)" src/
# src/auth/jwt.ts:45 ❌

# Check criterion 4
rg "unicode" tests/
# No matches ⚠️ Need to verify

Decision: GAPS FOUND, cannot proceed

What you gain:

Verified ALL criteria, not just tests
Caught anti-pattern violations
Caught pre-commit blockers
Prevented shipping non-compliant code
Spec contract honored completely </correction>

</example> <example> <scenario>Developer rationalizes skipping rigor for "simple" task</scenario> <code> bd task: "Add logging to error paths"

Developer thinks: "Simple task, just added console.log"

Skips:

Automated checks (assumes no issues)
Code quality review (seems obvious)
Full success criteria verification

Approves quickly:

"Logging added ✅"

Misses:

console.log used instead of proper logger (anti-pattern)
Only added to 2 of 5 error paths (incomplete)
No test verifying logs actually output (criterion)
Logs contain sensitive data (security issue) </code>

<why_it_fails>

"Simple" tasks have hidden complexity
Skipped rigor catches exactly these issues
Incomplete implementation (2/5 paths)
Security vulnerability shipped
Anti-pattern not caught
Failed success criterion (test logs) </why_it_fails>

<correction> **Follow full review process:**

# Automated checks
rg "console\.log" src/
# Found at error-handler.ts:12, 15 ⚠️

# Read bd task
bd show bd-5

# Success criteria:
# 1. "All error paths logged"
# 2. "No sensitive data in logs"
# 3. "Test verifies log output"

# Check criterion 1
grep -n "throw new Error" src/
# 5 locations found
# Only 2 have logging ❌ Incomplete

# Check criterion 2
Read tool: src/error-handler.ts
# Logs contain password field ❌ Security issue

# Check criterion 3
rg "test.*log" tests/
# No matches ❌ Test missing

Decision: GAPS FOUND

Incomplete (3/5 error paths missing logs)
Security issue (logs password)
Anti-pattern (console.log instead of logger)
Missing test

What you gain:

"Simple" task revealed multiple gaps
Security vulnerability caught pre-production
Rigor prevents incomplete work shipping
All criteria must be met, no exceptions </correction>

</example> <example> <scenario>Developer approves implementation with high test coverage but tautological tests</scenario> <code> # Test results show good coverage cargo test # 45 tests passed ✅ # Coverage: 92% ✅

Developer approves based on numbers

"Tests pass with 92% coverage, implementation complete ✅"

Proceeds to finishing-a-development-branch

Later in production:

- Validation bypassed because test only checked "validator exists"

- Race condition because test only checked "lock was acquired"

- Encoding corruption because test only checked "encode != nil"

</code>

<why_it_fails>

High coverage doesn't mean meaningful tests
Tests verified existence/syntax, not behavior
Tautological tests passed by definition:
- expect(validator != nil) - always passes, doesn't test validation logic
- expect(lock.acquire()) - tests mock, not thread safety
- expect(encoded.count > 0) - tests non-empty, not correctness
Production bugs occurred despite "good" test coverage
Coverage metrics were gamed with meaningless tests </why_it_fails>

<correction> **Audit each test for meaningfulness:**

# Find new tests
git diff main...HEAD --name-only | grep test

# Read and audit each test
Read tool: tests/validator_test.swift

For each test, document:

#### Test Quality Audit

| Test | Assertion | Bug Caught? | Verdict |
|------|-----------|-------------|---------|
| testValidatorExists | `!= nil` | ❌ None (compiler checks) | ❌ Remove |
| testValidInput | `isValid == true` | ⚠️ Happy path only | ⚠️ Add edge cases |
| testEmptyInputFails | `isValid == false` | ✅ Missing validation | ✅ Keep |
| testLockAcquired | mock.acquireCalled | ❌ Tests mock | ❌ Replace |
| testConcurrentAccess | count == expected | ✅ Race condition | ✅ Keep |
| testEncodeNotNil | `!= nil` | ❌ Type guarantees this | ❌ Remove |
| testUnicodeRoundtrip | decoded == original | ✅ Encoding corruption | ✅ Keep |

**Tautological tests:** 3 (must remove)
**Weak tests:** 1 (must strengthen)
**Meaningful tests:** 3 (keep)

Decision: GAPS FOUND ❌

## Test Quality Audit: GAPS FOUND

### Tautological Tests (Must Remove)
- testValidatorExists: Compiler ensures non-nil, test proves nothing
- testLockAcquired: Tests mock behavior, not actual thread safety
- testEncodeNotNil: Return type is non-optional, can never be nil

### Weak Tests (Must Strengthen)
- testValidInput: Only happy path, add:
  - testEmptyStringRejected
  - testMaxLengthRejected
  - testUnicodeNormalized

### Action Required
Remove 3 tautological tests, add 3 edge case tests, then re-review.

What you gain:

Real test quality, not coverage theater
Bugs caught before production
Tests that actually verify behavior
Confidence in test suite </correction>

</example> <example> <scenario>Developer completes refactoring but leaves old implementation as fallback</scenario> <code> # After refactoring auth system: git diff shows: + function authenticateV2(token) { ... } # New implementation function authenticate(token) { ... } # Old still exists! function authenticateLegacy(token) { ... } # Even older!

In config:

const USE_LEGACY_AUTH = process.env.LEGACY_AUTH ?? true

Developer claims: "Refactoring complete"

</code>

<why_it_fails>

Old implementations still present (authenticate, authenticateLegacy)
Feature flag enables fallback to old code
Tests may still use old functions
No single canonical implementation
Technical debt increased, not decreased
"Refactoring" actually added code instead of replacing </why_it_fails>

<correction> **Run dead code audit:**

# Fallback patterns
rg -i "legacy|old_|fallback" src/
# Found: authenticateLegacy, USE_LEGACY_AUTH ❌

# Check callers
rg "authenticate\(" src/ --type ts
# authenticate: 0 callers ❌ DEAD
# authenticateLegacy: 0 callers ❌ DEAD
# authenticateV2: 15 callers ✅ ACTIVE

Dead Code Audit Results:

Category	Pattern	Found	Location	Action
Fallback code	`legacy\|fallback`	2	auth.ts:45,89	❌ Delete
Unused functions	no callers	2	authenticate(), authenticateLegacy()	❌ Delete
Feature flags	`USE_LEGACY`	1	config.ts:12	❌ Delete

Decision: GAPS FOUND ❌

## Dead Code Audit: GAPS FOUND

### Refactoring Remnants
- authenticate() at auth.ts:12 - 0 callers, delete
- authenticateLegacy() at auth.ts:45 - 0 callers, delete
- USE_LEGACY_AUTH flag at config.ts:12 - enables dead code, delete

### Required Actions
1. Delete authenticate() - replaced by authenticateV2()
2. Delete authenticateLegacy() - obsolete
3. Delete USE_LEGACY_AUTH flag - no longer needed
4. Rename authenticateV2() to authenticate() (cleaner API)
5. Update/delete tests for removed functions

**Cannot approve until old code is removed.**

What you gain:

Single canonical implementation
No dead code accumulation
Tests test actual functionality
Technical debt reduced, not increased
Refactoring actually complete </correction>

</example> </examples>

<critical_rules>

Rules That Have No Exceptions

Review every task → No skipping "simple" tasks
Run all automated checks → TODOs, stubs, unwrap, ignored tests
Run dead code audit → Fallback code, unused functions, deprecation markers
Read actual files with Read tool → Not just git diff
Verify every success criterion → With evidence, not assumptions
Check all anti-patterns → Search for prohibited patterns
Apply Google Fellow scrutiny → Production-grade code review
Audit all new tests for meaningfulness → Tautological tests = gaps, not coverage
If gaps found → STOP → Don't proceed to finishing-a-development-branch

Common Excuses

All of these mean: STOP. Follow full review process.

"Tests pass, must be complete" (Tests ≠ spec, check all criteria)
"I implemented it, it's done" (Implementation ≠ compliance, verify)
"No time for thorough review" (Gaps later cost more than review now)
"Looks good to me" (Opinion ≠ evidence, run verifications)
"Small gaps don't matter" (Spec is contract, all criteria matter)
"Will fix in next PR" (This PR completes this epic, fix now)
"Can check diff instead of files" (Diff shows changes, not context)
"Automated checks cover it" (Checks + code review both required)
"Success criteria passing means done" (Also check anti-patterns, quality, edge cases)
"Tests exist, so testing is complete" (Tautological tests = false confidence)
"Coverage looks good" (Coverage can be gamed with meaningless tests)
"Tests are boilerplate, don't need review" (Every test must catch a real bug)
"It's just a simple existence check" (Compiler already checks existence)
"Keeping old code as fallback is safe" (Old code = dead code, delete it)
"We might need the old implementation later" (Version control remembers, delete now)
"Backwards compatibility requires the shim" (Internal code doesn't need backwards compat)
"Deprecation marker is enough" (Deprecation = "delete soon", not "keep forever")
"The old tests still pass" (Tests for removed code = orphaned tests, delete)

</critical_rules>

<verification_checklist> Before approving implementation:

Per task:

Overall:

Reviewed ALL tasks (no exceptions)
TodoWrite tracker shows all tasks reviewed
Compiled findings (approved or gaps)
If approved: all criteria met for all tasks
If gaps: documented exactly what missing

Can't check all boxes? Return to Step 2 and complete review. </verification_checklist>

<integration> **This skill is called by:** - hyperpowers:executing-plans (Step 5, after all tasks executed)

This skill calls:

hyperpowers:finishing-a-development-branch (if approved)
hyperpowers:test-runner agent (for quality gates)

This skill uses:

hyperpowers:verification-before-completion principles (evidence before claims)

Call chain:

hyperpowers:executing-plans → hyperpowers:review-implementation → hyperpowers:finishing-a-development-branch
                         ↓
                   (if gaps: STOP)

CRITICAL: Use bd commands (bd show, bd list, bd dep tree), never read .beads/issues.jsonl directly. </integration>

<resources> **Detailed guides:** - [Code quality standards by language](resources/quality-standards.md) - [Common anti-patterns to check](resources/anti-patterns-reference.md) - [Production readiness checklist](resources/production-checklist.md)

When stuck:

Unsure if gap critical → If violates criterion, it's a gap
Criteria ambiguous → Ask user for clarification before approving
Anti-pattern unclear → Search for it, document if found
Quality concern → Document as gap, don't rationalize away </resources>

review-implementation

Evidence-Based Review (Zero Speculation Principle)

Step 1: Load Epic Specification

Step 2: Review Each Task

A. Read Task Specification

B. Run Automated Code Completeness Checks

B2. Dead Code and Refactoring Remnants Audit

C. Run Quality Gates (via test-runner agent)

D. Read Implementation Files

E. Code Quality Review (Google Fellow Perspective)

E2. Test Quality Audit (Mandatory for All New Tests)

F. Verify Success Criteria with Evidence

G. Check Anti-Patterns

H. Verify Key Considerations

I. Record Findings

J. Mark Task Reviewed (TodoWrite)

Step 3: Report Findings

Step 4: Gate Decision

Developer sees:

Approves based on diff

Misses: Full context shows no validation

Developer concludes

Proceeds to finishing-a-development-branch

Misses:

Developer thinks: "Simple task, just added console.log"

Skips:

Approves quickly:

Misses:

Developer approves based on numbers

Proceeds to finishing-a-development-branch

Later in production:

- Validation bypassed because test only checked "validator exists"

- Race condition because test only checked "lock was acquired"

- Encoding corruption because test only checked "encode != nil"

In config:

Developer claims: "Refactoring complete"

Rules That Have No Exceptions

Common Excuses

Similar Skills