Analyze the blast radius of a potential change to help understand system dependencies and impact
Analyzes the blast radius of proposed changes to teach impact analysis and systems thinking through comprehensive dependency mapping and risk assessment.
/plugin marketplace add dgriffith/bad-daves-robot-army/plugin install dgriffith-bad-daves-robot-army@dgriffith/bad-daves-robot-armyUsing @agent-mentor investigate what would be affected by a proposed change, teaching impact analysis and systems thinking through comprehensive dependency mapping and risk assessment.
The user invoked: /what-would-break {proposed_change}
Examples:
/what-would-break if we change the User model schema - Data model changes/what-would-break removing the cache layer - Architectural changes/what-would-break changing this function signature - API changes/what-would-break upgrading React to v19 - Dependency upgrades/what-would-break - Interactive mode to discuss potential changesThis is an educational investigation designed to teach impact analysis, not to discourage changes. The goal is to help developers build the mental models that senior engineers use to reason about system dependencies, ripple effects, and change management.
We are teaching, not gatekeeping. Every finding should help the developer understand:
This tool should empower developers to make changes confidently, not frighten them away from necessary work.
Clarify what's being considered:
If unclear, ask:
Map the complete dependency graph:
Code Dependencies:
# Find all imports/requires of this module
grep -r "import.*from.*{module}" .
grep -r "require.*{module}" .
# Find all references to this function/class
grep -r "{identifier}" . --include="*.ts" --include="*.js"
# Use language-specific tools
# TypeScript: Check with tsc --noEmit and language server
# Python: Use AST analysis
# Go: Use go list -json
Data Dependencies:
API Dependencies:
Cascading Changes:
Transitive Dependencies:
Timing Dependencies:
State Dependencies:
Environmental Dependencies:
For each affected area, analyze:
Critical (System Breaking):
High (Feature Breaking):
Medium (Partial Impact):
Low (Minimal Impact):
Certain (Will Break):
Likely (Probably Breaks):
Possible (Might Break):
Unlikely (Edge Cases):
Map what tests protect against breakage:
Unit Tests:
Integration Tests:
End-to-End Tests:
Test Gaps:
Teach safe change patterns:
Incremental Rollout:
Backwards Compatibility:
Validation Strategies:
Communication Plans:
Create a markdown file at /reports/what-would-break-{change}-{timestamp}.md with the blast radius analysis.
# Blast Radius Analysis: [Change Description]
## The Proposed Change
**What**: [Specific change being considered]
**Why**: [Motivation for the change]
**Scope**: [How much would change]
**Status**: [Exploration / Planning / Ready to implement]
## Executive Summary
[2-3 sentence overview of impact and risk level]
**Overall Risk Level**: 🔴 High / 🟡 Medium / 🟢 Low
**Affected Systems**: [Count and list]
**Required Test Updates**: [Count]
**Estimated Effort**: [T-shirt size or time estimate]
---
## Dependency Map
### Direct Dependencies (First-Order Effects)
[Things that immediately depend on what's changing]
#### Code References
**Files that import/use this code:** [Count]
- `src/services/auth.ts` - Uses `validateUser()` function
- **Impact**: Would need signature update
- **Lines**: 45, 67, 123
- **Severity**: 🔴 Critical - Auth would break
- `src/controllers/user.ts` - Calls this API
- **Impact**: Response shape change needed
- **Lines**: 89-92
- **Severity**: 🟡 Medium - Needs adapter
[Continue for all direct dependencies...]
#### Data Consumers
**Components reading/writing this data:** [Count]
- Database migrations
- **Impact**: Schema migration required
- **Risk**: 🔴 Data consistency issues
- Cache layer
- **Impact**: Cache invalidation needed
- **Risk**: 🟡 Stale data during transition
#### API Consumers
**External/internal API consumers:** [Count]
- Mobile app (iOS/Android)
- **Impact**: App update required
- **Risk**: 🔴 Version compatibility issues
- **Mitigation**: Support both formats during transition
- Third-party integrations
- **Impact**: Partner notification required
- **Risk**: 🟡 Partner systems may break
### Indirect Dependencies (Second-Order Effects)
[Things that depend on the things that depend on this]
#### Cascading Code Changes
- `src/services/notifications.ts` depends on `auth.ts`
- **Why it matters**: Auth changes propagate to notifications
- **What breaks**: User notification context
- **Learning**: Shared abstractions amplify changes
#### Transitive System Effects
- Monitoring dashboards
- **Why it matters**: Metrics rely on current data shape
- **What breaks**: Dashboard queries, alerts
- **Learning**: Observability is a dependency too
- Logging pipelines
- **Why it matters**: Log parsing expects current format
- **What breaks**: Log aggregation, debugging
- **Learning**: Developer tools depend on stability
### Hidden Dependencies (Subtle Effects)
[Non-obvious things that could break]
#### Performance Assumptions
- `src/cache/strategy.ts` assumes query takes <100ms
- **Why it matters**: Cache TTL tuned to current performance
- **What might break**: Cache hit rate drops
- **Learning**: Performance characteristics are implicit contracts
#### Timing and Race Conditions
- Event handlers expect current ordering
- **Why it matters**: State synchronization relies on order
- **What might break**: Race conditions emerge
- **Learning**: Temporal dependencies are often undocumented
#### Environmental Variations
- Production uses different DB version than dev
- **Why it matters**: SQL compatibility varies
- **What might break**: Works in dev, fails in prod
- **Learning**: Environment parity gaps create hidden risks
---
## Impact Assessment
### By Severity
#### 🔴 Critical Impacts (System Breaking)
[Changes that would cause outages or data loss]
1. **Authentication System Failure**
- **What breaks**: All authenticated endpoints
- **User impact**: Cannot log in
- **Revenue impact**: Complete service outage
- **Probability**: Certain (will break without updates)
- **Affected users**: All users
**Why this is critical:**
The auth system has no fallback. This is a single point of failure.
**What to do first:**
- Add comprehensive integration tests
- Plan staged rollout
- Prepare immediate rollback procedure
2. **Data Corruption Risk**
- **What breaks**: User profile data structure
- **User impact**: Data loss or corruption
- **Probability**: Likely (without migration)
**Why this is critical:**
Schema mismatch between old and new format could corrupt records.
**What to do first:**
- Write migration with rollback
- Test on production data snapshot
- Add data validation checks
#### 🟡 Medium Impacts (Feature Breaking)
[Changes that break features but not the whole system]
1. **User Profile Page Errors**
- **What breaks**: Profile rendering
- **User impact**: 500 errors on profile page
- **Probability**: Certain (field names change)
- **Workaround**: Falls back to default profile
**Why this matters:**
Degrades UX but doesn't block core workflows.
**What to do:**
- Update React components
- Add PropTypes validation
- Test profile edge cases
#### 🟢 Low Impacts (Minor Issues)
[Changes that need attention but aren't urgent]
1. **Internal Dashboard Metrics**
- **What breaks**: Admin dashboard charts
- **User impact**: Internal users only
- **Probability**: Certain (metric names change)
**Why this matters:**
Internal tooling, not customer-facing.
**What to do:**
- Update dashboard queries
- Document metric changes
### By Subsystem
#### Frontend Impact
**Affected Components**: [Count]
**Severity**: 🟡 Medium
- User profile components need updates
- Form validation logic changes
- API client needs new types
**Teaching moment:** Frontend changes ripple through component trees.
#### Backend Impact
**Affected Services**: [Count]
**Severity**: 🔴 High
- Auth service core change
- Database schema migration
- API versioning required
**Teaching moment:** Backend changes affect contracts with all clients.
#### Database Impact
**Tables Affected**: [Count]
**Severity**: 🔴 Critical
- Migration required
- Downtime possible
- Rollback complexity high
**Teaching moment:** Data migrations are high-risk and need thorough planning.
#### Infrastructure Impact
**Systems Affected**: [Count]
**Severity**: 🟢 Low
- Cache invalidation needed
- CDN purge required
- Monitoring updates
**Teaching moment:** Don't forget infrastructure dependencies.
---
## Test Coverage Analysis
### Existing Test Protection
#### ✅ Well-Covered Areas
[Tests that would catch breakage]
- **Unit Tests**: `auth.test.ts`
- Covers: Function signatures, basic behavior
- **Would catch**: Interface changes
- **Confidence**: High
- Lines: 234 tests, 95% coverage
- **Integration Tests**: `auth-integration.test.ts`
- Covers: Auth flow end-to-end
- **Would catch**: Workflow breakage
- **Confidence**: High
#### ⚠️ Partially Covered Areas
[Tests that might catch breakage]
- **E2E Tests**: Login flow tested
- Covers: Happy path only
- **Might catch**: Critical path breakage
- **Might miss**: Edge cases, error states
- **Gap**: No tests for failed auth scenarios
#### ❌ Uncovered Areas (Test Gaps)
[Areas with no test protection]
- **Missing Tests**: Token refresh flow
- **Why it matters**: Could break silently
- **Risk**: 🟡 Sessions expire unexpectedly
- **Recommendation**: Add before making changes
- **Missing Tests**: Third-party OAuth
- **Why it matters**: External integration
- **Risk**: 🔴 Partner integrations break
- **Recommendation**: Critical to test first
### What We're Relying on Luck For
**Untested assumptions:**
1. Database connection pooling handles new query patterns
- **Why risky**: Could cause connection exhaustion
- **How to test**: Load testing with new queries
2. Mobile apps handle API version correctly
- **Why risky**: No automated tests for version negotiation
- **How to test**: Manual testing on multiple app versions
**Teaching moment:** If there's no test, assume it will break. Add tests before changing.
---
## Safe Change Strategies
### Recommended Approach
[Best way to make this change safely]
**Strategy**: Expand-Contract Pattern (3-phase deployment)
**Phase 1: Expand (Add new without removing old)**
- Add new API endpoint alongside old
- Support both data formats
- Add feature flag for gradual rollout
- Deploy and monitor
**Phase 2: Migrate (Move clients to new)**
- Update frontend to new endpoint
- Migrate mobile apps (with backwards compat)
- Move internal services
- Monitor error rates closely
**Phase 3: Contract (Remove old)**
- Deprecate old endpoint (with warnings)
- Wait for client adoption
- Remove old code
- Clean up feature flags
**Teaching moment:** The safest way to change APIs is to run both versions simultaneously.
### Alternative Approaches
#### Option A: Big Bang Deployment
**Pros**: Faster, simpler code
**Cons**: High risk, all-or-nothing
**When to use**: Small changes with excellent test coverage
**Risk level**: 🔴 High
#### Option B: Shadow Mode
**Pros**: Test in production without risk
**Cons**: More complex, requires duplicate processing
**When to use**: High-risk changes, performance-critical code
**Risk level**: 🟢 Low
**Teaching moment:** Different situations call for different strategies.
### Rollout Plan
**Week 1: Preparation**
- [ ] Add missing tests (focus on critical paths)
- [ ] Set up feature flag
- [ ] Create monitoring dashboard
- [ ] Write rollback runbook
- [ ] Review with team
**Week 2: Deploy Phase 1 (Expand)**
- [ ] Deploy new code (behind feature flag)
- [ ] Enable for internal users only
- [ ] Monitor error rates, performance
- [ ] Fix any issues discovered
**Week 3: Deploy Phase 2 (Migrate)**
- [ ] Gradually increase feature flag % (10%, 25%, 50%, 100%)
- [ ] Monitor at each stage
- [ ] Update mobile apps
- [ ] Notify third-party partners
**Week 4: Deploy Phase 3 (Contract)**
- [ ] Mark old API as deprecated
- [ ] Wait 2 weeks for stragglers
- [ ] Remove old code
- [ ] Clean up feature flags
**Teaching moment:** Patient, incremental rollouts catch issues before they affect everyone.
### Monitoring and Alerts
**Critical Metrics to Watch:**
- **Error Rate**: Auth failures > 0.1%
- **Alert threshold**: 0.5%
- **Action**: Immediate rollback
- **Latency**: Login time > 500ms
- **Alert threshold**: 1000ms
- **Action**: Investigate before proceeding
- **Success Rate**: Login success < 99%
- **Alert threshold**: < 98%
- **Action**: Pause rollout
**Teaching moment:** You can't manage what you don't measure. Set up monitoring first.
### Rollback Plan
**Immediate Rollback Triggers:**
- Error rate > 1%
- Data corruption detected
- Security vulnerability discovered
- Critical partner integration breaks
**Rollback Procedure:**
```bash
# 1. Disable feature flag
curl -X POST api/flags/new-auth/disable
# 2. Revert deployment (if needed)
kubectl rollout undo deployment/auth-service
# 3. Verify rollback successful
curl api/health/auth
# 4. Notify team and stakeholders
Rollback Time: < 5 minutes Data Rollback: Migration has rollback script tested
Teaching moment: Always have a rollback plan before you need it.
Teaching moment: Communication is part of the change, not an afterthought.
About Dependencies:
About Risk Management:
About Change Management:
High-Risk Change Patterns:
Lower-Risk Change Patterns:
Teaching moment: With practice, you'll recognize high-risk patterns instantly.
Before making any change, ask yourself:
Who depends on this?
What might break?
How will I know if it breaks?
How can I make it safer?
What's my rollback plan?
Teaching moment: Senior engineers ask these questions reflexively. Now you have the checklist.
High Priority (Must Do):
Medium Priority (Should Do):
Low Priority (Nice to Have):
/why-this-way [code]/explain [concept][Link to other similar changes in the codebase]
Teaching moment: Every change is a learning opportunity. Study both successes and failures.
Analysis Completeness: High / Medium / Low Test Coverage Confidence: High / Medium / Low Risk Assessment Confidence: High / Medium / Low
Areas of Uncertainty: [Things we're not sure about]
How to Improve Confidence:
The Change: [One sentence]
Risk Level: 🔴/🟡/🟢
Critical Dependencies: [Top 3]
Must-Have Tests: [Top 3 test gaps to fill]
Recommended Strategy: [Chosen approach]
Timeline: [Realistic estimate]
First Step: [Specific next action]
Teaching moment: You now understand the full impact. This is how senior engineers think about changes. You're ready to proceed safely.
Analysis completed: [timestamp] Files analyzed: [count] Dependencies mapped: [count] Tests reviewed: [count] Risk areas identified: [count]
Remember: This analysis is meant to empower you to make changes confidently, not to discourage necessary improvements. Every large system has complexity. With careful planning and incremental rollout, you can safely evolve even critical code.
## Investigation Techniques
### Dependency Discovery
**Static Analysis:**
```bash
# Find all imports
grep -r "import.*{identifier}" . --include="*.ts"
# Find all string references (for dynamic imports)
grep -r "{identifier}" . --include="*.ts" --include="*.js"
# Use AST tools for accurate analysis
npx ts-node -e "import * as ts from 'typescript'; /* analyze AST */"
# Language Server Protocol for IDE-quality analysis
# Better than grep for finding actual usages
Runtime Analysis:
# Find all calls in logs
grep "{function_name}" logs/production.log | wc -l
# Check monitoring for usage patterns
# Grafana, DataDog, etc. for API call volumes
# Review APM traces for call graphs
# See what actually calls what in production
Test Analysis:
# Find all tests that mention this code
grep -r "describe.*{module}" . --include="*.test.*"
grep -r "it.*{function}" . --include="*.test.*"
grep -r "expect.*{identifier}" . --include="*.test.*"
# Check test coverage reports
cat coverage/lcov-report/index.html | grep "{file}"
Documentation Search:
# Find docs that reference this
grep -r "{identifier}" docs/ README.md
# Check API docs
grep -r "{endpoint}" docs/api/
# Search wiki/confluence if available
Code Complexity:
Usage Metrics:
Historical Data:
# How often does this code change?
git log --oneline -- {file} | wc -l
# How many bugs were found here?
gh issue list --search "involves:{file} label:bug"
# Who knows this code?
git log --format="%an" -- {file} | sort | uniq -c | sort -rn
Combine factors for overall risk score:
Risk = Severity × Probability × Blast Radius
Where:
Examples:
What Senior Engineers Know (That We're Teaching):
Dependency Awareness
Risk Calibration
Incremental Thinking
Systems Thinking
DO:
DON'T:
Include real examples from the codebase:
## Historical Example: Similar Change
In PR #234, we made a similar change to the Payment API:
**What they did right:**
- Ran both versions simultaneously for 2 weeks
- Added comprehensive monitoring
- Had clear rollback plan
**What we learned:**
- Edge case appeared only at 50% rollout
- Monitoring caught it before major impact
- Rollback took 2 minutes
**Apply to your change:**
- Use the same expand-contract pattern
- Watch for similar edge cases in [area]
- Set up similar monitoring for [metric]
For Simple Changes:
## Quick Assessment: Low Risk ✅
This change:
- Only affects internal implementation
- Has excellent test coverage
- No external API changes
- Small blast radius
**Quick wins:**
- Add one integration test for [edge case]
- Monitor [metric] after deploy
- Can deploy directly to production
**You're good to go!** This is a textbook low-risk change.
For Complex Changes:
## In-Depth Analysis Required ⚠️
This change touches critical infrastructure. Let's map it carefully:
[Full detailed analysis follows...]
**Don't be intimidated!** With the right approach, this is totally manageable:
[Specific strategy...]
Great news! This is a relatively low-risk change. Here's why:
✅ Internal implementation only (no API changes)
✅ Good test coverage (85%)
✅ Small blast radius (3 files)
✅ Well-isolated module
**Quick action items:**
1. Add one test for [edge case]
2. Monitor [metric] after deploy
3. Deploy to staging first
[Create concise report]
**Teaching moment:** This is what a low-risk change looks like. Notice the small blast radius and good test coverage. You can move forward confidently!
This change has moderate complexity and risk. Here's the full picture:
[Analyze dependencies and impact]
**Key risks:**
1. [Risk 1] - Mitigate with [strategy]
2. [Risk 2] - Mitigate with [strategy]
**Recommended approach:**
Use feature flag + gradual rollout [details]
[Create detailed report]
**Teaching moment:** Medium-risk changes are where careful planning pays off. With the right strategy, you can make this change safely.
This is a high-impact change that touches critical infrastructure. Let's plan this carefully:
[Comprehensive analysis]
**Critical risks:**
1. [Risk 1] - Requires [mitigation]
2. [Risk 2] - Requires [mitigation]
**Recommended approach:**
Shadow mode deployment over 3 phases [details]
[Create comprehensive report with detailed rollout plan]
**Teaching moment:** High-risk changes aren't scary when you have a solid plan. The key is incremental rollout with excellent monitoring. I'll help you execute this safely.
I can see why this seems risky, but it's actually safer than it appears! Here's why:
**What makes it feel risky:**
- [Perception 1]
- [Perception 2]
**But actually:**
- ✅ [Mitigating factor 1]
- ✅ [Mitigating factor 2]
- ✅ [Mitigating factor 3]
**Real risk level:** 🟢 Low (not 🔴 high)
[Brief analysis showing actual low impact]
**Teaching moment:** Some changes feel scary but are actually safe. Learning to calibrate risk accurately comes with experience.
This seems simple on the surface, but let me show you some hidden dependencies:
[Reveal surprising impacts]
**Surprising findings:**
1. [Hidden dependency 1] - Here's why it matters
2. [Hidden dependency 2] - Here's the impact
3. [Hidden dependency 3] - Here's the risk
**Actual risk level:** 🟡 Medium (not 🟢 low)
**Good news:** Now that we know, we can plan appropriately:
[Mitigation strategies]
**Teaching moment:** This is exactly the kind of hidden complexity that senior engineers learn to look for. Now you know what to check for!
Remember: The goal is to build senior engineers, not to make junior engineers afraid to change code. Every analysis should leave them more confident and better equipped to reason about system complexity.
YOU MUST CREATE THE REPORT FILE. This is not optional.
Create the report file using the Write tool at the specified path:
/reports/{command-name}-{scope}-{timestamp}.mdYYYY-MM-DD-HHmmss/reports/architecture-review-entire-project-2025-10-14-143022.mdFill in ALL sections of the report template
Confirm completion by telling the user:
❌ DON'T: Just summarize findings in the chat ❌ DON'T: Say "I'll create a report" without actually doing it ❌ DON'T: Leave sections incomplete or with placeholders ❌ DON'T: Forget to use the Write tool
✅ DO: Always use the Write tool to create the markdown file ✅ DO: Fill in every section with real findings ✅ DO: Provide the full path to the user when done ✅ DO: Include actionable recommendations
Before responding to the user, verify:
Remember: The report is the primary deliverable. The chat summary is secondary.