Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
/plugin marketplace add withzombies/hyperpowers/plugin install withzombies-hyper@withzombies-hyperThis skill inherits all available tools. When active, it can use any tool Claude has access to.
<skill_overview> Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation. </skill_overview>
<rigidity_level> LOW FREEDOM - Follow the 8-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps. </rigidity_level>
<quick_reference>
| Category | Key Questions | Auto-Reject If |
|---|---|---|
| 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown |
| 2. Implementability | Junior can execute without questions? | Vague language, missing details |
| 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") |
| 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies |
| 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section |
| 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration |
| 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |
| 8. Test Meaningfulness | Tests catch real bugs? Not tautological? | Tests only verify syntax/existence |
Perspective: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.
Time: Don't rush - catching one gap pre-implementation saves hours of rework. </quick_reference>
<when_to_use> Use when:
Don't use when:
<the_process>
Announce: "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."
Check:
If task >16 hours:
bd createbd dep add child parent --type parent-childCheck:
Red flags:
Check:
Good criteria examples:
Bad criteria examples:
Check:
Verify with:
bd dep tree bd-1 # Show full dependency tree
Check:
Minimum anti-patterns:
Ask for each task:
Add to Key Considerations section:
Check for these - if found, REJECT plan:
Tests must catch real bugs, not inflate coverage. For every test specification:
Ask these questions:
result == expected vs result != nil)Red flags (AUTO-REJECT):
expect(builder.build() != nil) when build() can't return nil)Good test specifications:
Bad test specifications (reject or strengthen):
When reviewing test specifications:
For each test in success criteria, verify:
Test: "test_vin_validation"
- What bug does it catch? ⚠️ Unclear - need specific scenarios
- Could code break while test passes? ⚠️ Unknown without specifics
STRENGTHEN TO:
- test_valid_vin_checksum_accepted
- test_invalid_vin_checksum_rejected (catches missing checksum validation)
- test_lowercase_vin_normalized (catches case handling bug)
- test_vin_with_invalid_chars_rejected (catches input validation bug)
For each task in the plan:
Step 1: Read the task
bd show bd-3
Step 2: Apply all 8 checklist categories
Step 3: Document findings Take notes:
Step 4: Update the task
Use bd update to add missing information:
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Original goal, preserved]
## Effort Estimate
[Updated estimate if needed]
## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing measurable criteria
## Implementation Checklist
[Complete checklist with file paths]
## Key Considerations (ADDED BY SRE REVIEW)
**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing
**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods
**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition
**Reference Implementation**
- Study src/similar/module.rs for pattern to follow
## Anti-patterns
[Original anti-patterns]
- ❌ NEW: Specific anti-pattern for this task's risks
EOF
)"
IMPORTANT: Use --design for full detailed description, NOT --description (title only).
Step 5: Verify no placeholder text (MANDATORY)
After updating, read back with bd show bd-N and verify:
If task >16 hours, create subtasks:
# Create first subtask
bd create "Subtask 1: [Specific Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design with all 7 categories addressed]"
# Returns bd-10
# Create second subtask
bd create "Subtask 2: [Another Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design]"
# Returns bd-11
# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child # bd-11 is child of bd-3
# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10 # bd-11 depends on bd-10 (do bd-10 first)
# Update parent to coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.
## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"
After reviewing all tasks:
## Plan Review Results
### Epic: [Name] ([epic-id])
### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]
### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]
**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]
### Task-by-Task Review
#### [Task Name] (bd-N)
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])
**Strengths**:
- [What's done well]
**Critical Issues** (must fix):
- [Blocking problems]
**Improvements Needed**:
- [What to add/clarify]
**Edge Cases Missing**:
- [Failure modes not addressed]
**Changes Made**:
- [Specific improvements added via `bd update`]
---
[Repeat for each task/phase/subtask]
### Summary of Changes
**Issues Updated**:
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
- bd-7 - Strengthened success criteria (added test names, verification commands)
### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]
### Recommendations
[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed
[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement
[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]
</the_process>
<examples> <example> <scenario>Developer reviews task but skips edge case analysis (Category 6)</scenario> <code> # Review of bd-3: Implement VIN scannerConclusion: "Task looks good, approve ✅"
<why_it_fails>
## Edge Case Analysis for bd-3: VIN Scanner
Ask for EVERY task:
- Malformed input? VIN has checksum - must validate, not just pattern match
- Empty/nil? What if empty string passed?
- Concurrency? Read-only scanner, no concurrency issues
- Dependency failures? No external dependencies
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking
Findings:
❌ VIN checksum validation not mentioned (will match random strings)
❌ Case normalization not mentioned (lowercase VINs exist)
❌ Regex backtracking risk not mentioned (DoS vulnerability)
Update task:
bd update bd-3 --design "$(cat <<'EOF'
[... original content ...]
## Key Considerations (ADDED BY SRE REVIEW)
**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- MUST validate checksum, not just pattern - prevents false positives
**Case Normalization**:
- VINs can appear in lowercase
- MUST normalize to uppercase before validation
- Test with mixed case: "1hgbh41jxmn109186"
**Regex Backtracking Risk**:
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
- Test with pathological input: 10000 'X's followed by 16-char string
- Use possessive quantifiers or bounded repetition
- Reference: https://www.regular-expressions.info/catastrophic.html
**Edge Cases to Test**:
- Valid VIN with valid checksum (should match)
- Valid pattern but invalid checksum (should NOT match)
- Lowercase VIN (should normalize and validate)
- Ambiguous chars I/O not valid in VIN (should reject)
- Very long input (should not DoS)
EOF
)"
What you gain:
bd show bd-5:
"Looks comprehensive, has implementation checklist and success criteria ✅"
Junior engineer: "What are the 'implementation steps detailed above'?" Junior engineer: "What specific success criteria should I verify?" Junior engineer: "What key considerations exist?"
<why_it_fails>
## Red Flag Check (Category 7)
Read through bd-5 line by line:
Line 15: "[Complete implementation steps detailed above]"
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content
Line 22: "[As specified in the implementation checklist]"
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria
Line 30: "[Will be added during implementation]"
❌ PLACEHOLDER - Deferral to future, not actual considerations
DECISION: REJECT ❌
Reason: Contains placeholder text - task not ready for implementation
Update task with actual content:
bd update bd-5 --design "$(cat <<'EOF'
## Implementation Checklist
- [ ] Create src/scan/plugins/scanners/license_plate.rs
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
- [ ] Add regex patterns for US states:
- CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
- NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
- TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
- Generic: `[A-Z0-9]{5,8}` (fallback)
- [ ] Implement has_healthcare_context() check
- [ ] Create test module with 8+ test cases
- [ ] Register in src/scan/plugins/scanners/mod.rs
## Success Criteria
- [ ] Valid CA plate "1ABC123" detected in healthcare context
- [ ] Valid NY plate "ABC1234" detected in healthcare context
- [ ] Invalid plate "123" NOT detected (too short)
- [ ] Valid plate NOT detected outside healthcare context
- [ ] 8+ unit tests pass covering all patterns and edge cases
- [ ] Clippy clean, no warnings
- [ ] cargo test passes
## Key Considerations
**False Positive Risk**:
- License plates are short and generic (5-8 chars)
- MUST require healthcare context via has_healthcare_context()
- Without context, will match random alphanumeric sequences
- Test: Random string "ABC1234" should NOT match outside healthcare context
**State Format Variations**:
- 50 US states have different formats
- Implement common formats (CA, NY, TX) + generic fallback
- Document which formats supported in module docstring
- Consider international plates in future iteration
**Performance**:
- Regex patterns are simple, no backtracking risk
- Should process <1ms per chunk
**Reference Implementation**:
- Study src/scan/plugins/scanners/vehicle_identifier.rs
- Follow same pattern: regex + context check + tests
EOF
)"
Verify no placeholder text:
bd show bd-5
# Read entire output
# Confirm: All sections have actual content
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
# ✅ Task ready for implementation
What you gain:
bd show bd-7:
"Has 3 success criteria ✅ Meets minimum requirement"
Junior engineer: "How do I know if encryption is 'correct'?" Junior engineer: "What makes code 'good quality'?" Junior engineer: "What does 'tests work properly' mean?"
<why_it_fails>
## Success Criteria Analysis for bd-7
Current criteria:
- [ ] Encryption is implemented correctly
❌ NOT TESTABLE - "correctly" is subjective, no standard specified
- [ ] Code is good quality
❌ NOT TESTABLE - "good quality" is opinion, not measurable
- [ ] Tests work properly
❌ NOT TESTABLE - "properly" is vague, no definition
Minimum requirement: 3+ specific, measurable, testable criteria
Current: 0 testable criteria
DECISION: REJECT ❌
Update with measurable criteria:
bd update bd-7 --design "$(cat <<'EOF'
[... original content ...]
## Success Criteria
**Encryption Implementation**:
- [ ] Uses AES-256-GCM mode (verified in code review)
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
- [ ] Unique IV generated per encryption (crypto_random)
- [ ] Authentication tag verified on decryption
**Code Quality** (automated checks):
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
- [ ] Rustfmt compliant: `cargo fmt --check`
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0
**Test Coverage**:
- [ ] 12+ unit tests pass covering:
- test_encrypt_decrypt_roundtrip (happy path)
- test_wrong_key_fails_auth (security)
- test_modified_ciphertext_fails_auth (security)
- test_empty_plaintext (edge case)
- test_large_plaintext_10mb (performance)
- test_unicode_plaintext (data handling)
- test_concurrent_encryption (thread safety)
- test_iv_uniqueness (security)
- [4 more specific scenarios]
- [ ] All tests pass: `cargo test encryption`
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`
**Documentation**:
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
- [ ] Function docstrings include examples
- [ ] Security considerations documented (key management, IV handling)
**Security Review**:
- [ ] No hardcoded keys or IVs (verified via grep)
- [ ] Key zeroized after use (verified in code)
- [ ] Constant-time comparison for auth tag (timing attack prevention)
EOF
)"
What you gain:
<critical_rules>
bd show and confirm actual contentAll of these mean: STOP. Apply the full process.
<verification_checklist> Before completing SRE review:
Per task reviewed:
bd update --designbd show (no placeholders remain)Overall plan:
bd dep treeCan't check all boxes? Return to review process and complete missing steps. </verification_checklist>
<integration> **This skill is used after:** - hyperpowers:writing-plans (creates initial plan) - hyperpowers:brainstorming (establishes requirements)This skill is used before:
This skill is also called by:
Call chains:
Initial planning:
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
↓
(if gaps: revise and re-review)
During execution (for new tasks):
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint
This skill uses:
Time expectations:
Don't rush: Catching one critical gap pre-implementation saves hours of rework. </integration>
<resources> **Review patterns:** - Task too large (>16h) → Break into 4-8h subtasks - Vague criteria ("works correctly") → Measurable commands/checks - Missing edge cases → Add to Key Considerations with mitigations - Placeholder text → Rewrite with actual content - Tautological tests → Strengthen to catch specific bugsTest meaningfulness questions:
!= nil is weaker than == expectedValueWhen stuck:
Key principle: Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete. Tests must catch bugs, not inflate metrics. </resources>
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.