From claude-swe-workflows
Performs white-box security audits: blue-team evaluates defensive posture, red-team attacks gaps, iterates on exploit chains. For thorough codebase vulnerability assessments.
npx claudepluginhub chrisallenlane/claude-swe-workflows --plugin claude-swe-workflowsThis skill uses the workspace's default tool permissions.
Orchestrates a comprehensive security assessment of the project's source code using both defensive and offensive analysis. A blue-teamer evaluates the defensive posture first, then a lead red-teamer performs reconnaissance informed by the defensive gaps. Dedicated red-teamers investigate each attack vector in depth. Findings are synthesized, exploit chains are explored, and the process iterates...
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Orchestrates a comprehensive security assessment of the project's source code using both defensive and offensive analysis. A blue-teamer evaluates the defensive posture first, then a lead red-teamer performs reconnaissance informed by the defensive gaps. Dedicated red-teamers investigate each attack vector in depth. Findings are synthesized, exploit chains are explored, and the process iterates until no new chains emerge.
This is deliberately heavy. Thoroughness is the priority, not speed. A complete audit may spawn many agents and take significant time. That's the point — shallow security reviews miss the vulnerabilities that matter.
┌──────────────────────────────────────────────────────┐
│ AUDIT WORKFLOW │
├──────────────────────────────────────────────────────┤
│ 1. Determine scope │
│ 2. Spawn blue-teamer (defense evaluation) │
│ └─ Output: control inventory + gaps + depth │
│ 3. Spawn lead red-teamer (reconnaissance) │
│ └─ Input: blue-teamer's defense evaluation │
│ └─ Output: attack surface + ranked vector list │
│ 4. For each high-confidence vector: │
│ └─ Spawn focused red-teamer (deep investigation) │
│ 5. Synthesize findings │
│ ├─ If exploit chains found → goto 4 (new vector) │
│ └─ If no new chains → proceed │
│ 6. Present consolidated findings to user │
│ 7. Optionally route findings to fixers │
└──────────────────────────────────────────────────────┘
Default: Production code only. The following are excluded by default:
Inform the user of these exclusions when presenting the scope. If the user wants to include any of them, respect that.
If user specifies scope: Respect it (directory, files, module, feature area). Pass scope to all spawned agents.
Ask the user:
User concerns inform the prioritization of vectors in later steps, but the blue-teamer and lead red-teamer still perform full analysis — user intuition supplements, not replaces, systematic analysis.
Spawn a sec-blue-teamer agent for full defense evaluation:
You are the blue-teamer for a white-box security audit. Your defense evaluation
will be passed to the red team to inform their attack planning.
Scope: [entire codebase | user-specified scope]
User concerns: [any areas of concern mentioned by user, or "none specified"]
Perform your full methodology:
1. Inventory security controls — map every defense that exists (auth, authz,
input validation, CSRF, headers, rate limiting, crypto, secrets, logging)
2. Evaluate each control — correctness, consistency, failure mode
3. Identify missing controls — what should exist but doesn't, given the
application type?
4. Assess defense-in-depth — where does security rely on a single control?
5. Review configuration — are security features properly configured?
6. Dependency hygiene — run available tooling, check for CVEs and supply chain
concerns
7. Secrets and credentials — check for secrets in the wrong places
Pay special attention to CONSISTENCY. The red team will exploit every gap where
a control exists but isn't applied universally.
Output your full report in your standard format. Your findings will be passed
directly to the lead red-teamer to inform reconnaissance.
When the blue-teamer reports back: Review the defense evaluation. The control inventory, gap analysis, and defense-in-depth assessment become critical input for the red team.
Spawn a sec-red-teamer agent in broad recon mode, informed by the blue-team evaluation:
You are the lead red-teamer for a white-box security audit. The blue team has
already evaluated the defensive posture. Use their findings to focus your
reconnaissance on the weakest defenses.
Scope: [entire codebase | user-specified scope]
User concerns: [any areas of concern mentioned by user, or "none specified"]
## BLUE TEAM DEFENSE EVALUATION
[Full blue-teamer report — control inventory, gaps, defense-in-depth assessment]
Perform phases 1–3 of your methodology:
1. Reconnaissance — map the full attack surface (every entry point, what it
accepts, who can reach it). Cross-reference with the blue team's control
inventory to identify which entry points lack defenses.
2. Data flow tracing — for each entry point, trace input to its final
destination. The blue team identified consistency gaps — verify whether
those gaps are exploitable.
3. Trust boundary mapping — identify where trust transitions occur. The blue
team flagged single points of security failure — these are your priority
boundaries.
Do NOT perform deep exploitation yet. Your job is to survey the landscape and
produce a prioritized target list. The blue team's findings should make your
recon significantly more targeted.
Output a structured report:
## ATTACK SURFACE
[Entry points discovered, ranked by exposure]
[Note which entry points the blue team identified as unprotected or
inconsistently protected]
## TRUST BOUNDARIES
[Trust boundaries identified, noting implicit/unguarded ones]
[Cross-reference with blue team's defense-in-depth assessment]
## TARGET LIST
For each promising attack vector, provide:
- Target: [entry point or code path]
- Files: [specific files and line ranges to focus on]
- Hypothesis: [what you think might be exploitable and why]
- Blue team context: [relevant defensive gaps from blue team report]
- Context: [relevant framework protections, validation observed, transformations]
- Priority: [CRITICAL / HIGH / MEDIUM / LOW]
- Investigation approach: [what the focused red-teamer should try]
Rank targets by a combination of exposure (how easy to reach) and potential
impact (how bad if exploited). Include up to 25 targets — but this is a
MAXIMUM, not a quota. Report only targets that genuinely warrant
investigation. A short list is fine. An empty list means the codebase is
well-defended — that is a positive outcome, not a failure. Do not
manufacture or inflate targets to fill slots.
When the lead reports back: Review the target list. This is the basis for the deep-dive phase.
For each target in the lead's list (ALL priorities), spawn a dedicated sec-red-teamer agent:
You are a focused red-teamer investigating a single attack vector.
## YOUR TARGET
Target: [from lead's report]
Files: [from lead's report]
Hypothesis: [from lead's report]
Blue team context: [defensive gaps relevant to this target]
Context: [from lead's report]
Investigation approach: [from lead's report]
## PRIOR FINDINGS (if any)
[Findings from other focused red-teamers that might be relevant — especially for chain analysis]
## YOUR MISSION
Go deep on this one target. You have the full methodology available, but your scope is narrow: this single attack vector. Dedicate your full attention to it.
Perform phases 4–7 of your methodology on this target:
4. Break assumptions — systematically challenge what the developer assumed about input to this entry point
5. Exploit error paths — trigger errors in this code path and see what breaks
6. Attack state and timing — look for race conditions, replay, sequence bypass specific to this target
7. Git archaeology — check the history of these specific files for security smells
For each finding:
- Describe the concrete attack (specific enough to reproduce)
- Assess exploitability (how hard is this to actually pull off?)
- Assess impact (what does the attacker get?)
- Note any dependencies on other findings (for chain analysis)
If this vector is a dead end, say so. Don't manufacture findings. A clean report on a well-defended target is valuable.
Run focused agents sequentially, not in parallel. Each agent's findings may inform the next (chain analysis depends on accumulating findings).
Pass prior findings to each new agent. As findings accumulate, each subsequent focused agent receives a summary of what prior agents found. This enables chain discovery — agent 3 might realize that agent 1's low-severity information disclosure combines with agent 2's SSRF to create a critical chain.
After all focused agents have reported, synthesize their findings.
Chain analysis:
If chains are discovered:
Convergence: The loop terminates when a synthesis pass produces no new chains. Typically this takes 1–2 chain iterations. If chain analysis keeps producing new chains after 3 iterations, present current findings and let the user decide whether to continue.
Compile all findings from all agents into a single report:
## Security Audit Summary
Scope: [what was audited]
Defense evaluation: [summary — N controls inventoried, M gaps found]
Attack surface: [N entry points identified]
Vectors investigated: [N of M targets from recon]
Findings: N (X critical, Y high, Z medium, W low)
Exploit chains: N
## DEFENSE POSTURE (from blue-teamer)
[Summary of control inventory and key gaps]
[Defense-in-depth assessment — where security relies on a single control]
## ATTACK SURFACE (from lead red-teamer)
[Entry points discovered, ranked by exposure]
## FINDINGS
### CRITICAL
- **[file:line — target]** — [vulnerability description]
- Attack: [concrete exploitation path]
- Impact: [what the attacker gets]
- Data flow: [entry] → [transformations] → [sink]
- Defensive gap: [what the blue team identified that enabled this]
- Fix: [remediation guidance]
- Discovered by: [blue team | lead recon | focused agent for <target> | chain analysis]
### HIGH
[same format]
### MEDIUM
[same format]
### LOW
[same format]
## EXPLOIT CHAINS
- **[chain name]** — [description of the combined attack]
- Components: [finding A] + [finding B] + ...
- Combined impact: [what the chain achieves that individual findings don't]
- Fix: [which component to fix to break the chain — usually the cheapest link]
## TOOLING RECOMMENDATIONS
[Security tools the project should adopt]
## AREAS NOT COVERED
[Entry points that were deprioritized, limitations of static analysis, things that need runtime testing]
Present to user interactively. Walk through CRITICAL findings first. For each, explain the attack, the impact, and the recommended fix. Let the user ask questions and discuss before moving to the next finding.
After presenting findings, ask the user: "Would you like to route these findings to agents for remediation?"
If yes:
For each finding, determine the appropriate fixer:
swe-sme-html, swe-sme-javascript, or swe-sme-css depending on the fixsec-blue-teamer for defensive remediation guidance, then language SME for implementationSpawn the appropriate agent with the finding details and remediation guidance
After each fix, spawn qa-engineer to verify the fix doesn't break functionality
Commit each fix atomically
If no: The audit report stands on its own. The user can act on findings at their discretion.
Sequential execution within each phase. The blue-teamer runs first, then the lead red-teamer (with blue-team input), then focused red-teamers run sequentially so findings accumulate for chain analysis.
Fresh instances for every agent. Each agent gets a clean context window dedicated entirely to its task. This is the core design principle — full context dedicated to a single concern.
State to maintain (as orchestrator):
Abort focused investigation:
Abort entire workflow:
Do NOT abort for:
Relationship to /bug-fix:
/bug-fix invokes sec-blue-teamer for scoped security review of changed code/review-security is a dedicated, full-depth security audit/review-security proactively; /bug-fix handles security reactivelyRelationship to /implement:
/implement may invoke sec-blue-teamer as part of its review phase/review-security is independent and deeper — run it when security assurance matters, not as part of routine developmentRelationship to /review-release:
/review-release includes basic security checks (secrets, debug artifacts)/review-security is a comprehensive pre-release security audit — run it before major releases or after significant feature additionsRelationship to /review-deep:
/review-deep runs /review-security as one phase of a full pre-release sweep across every /review-* dimension/review-deep when you want the full sweep; use /review-security alone when security assurance is the specific goal> /review-security
What is the scope of the audit?
> Entire codebase
Anything you're particularly concerned about?
> We just added OAuth support and I'm worried about the token handling
Any areas to skip?
> vendor/ and testdata/
Starting white-box security audit...
[Phase 1 — Defense Evaluation]
Spawning blue-teamer...
Blue-teamer report:
Controls inventoried: 8
Key gaps:
- Auth middleware missing on 3 of 14 routes (/internal/*, /ws/*, /api/export)
- No parameterized queries — ORM used for 11 of 14 queries, 3 use raw SQL
- CSRF protection on POST only, not PUT/DELETE
- No rate limiting on /api/auth/* endpoints
- OAuth state parameter generated but never validated on callback
Defense-in-depth: Single-layer defense on 4 critical paths
[Phase 2 — Reconnaissance]
Spawning lead red-teamer (with blue-team findings)...
Lead red-teamer report:
Attack surface: 14 entry points (8 API, 3 WebSocket, 2 CLI, 1 file upload)
Trust boundaries: 5 identified (2 implicit — database trust, env var trust)
Targets identified: 7 (3 critical, 3 high, 1 medium)
Note: Blue team's finding about missing auth on /internal/* routes
and unvalidated OAuth state confirmed as high-priority targets.
Target list:
CRITICAL-1: POST /api/auth/callback — OAuth state not validated (blue team flagged)
CRITICAL-2: POST /api/upload — file upload with path construction from user input
CRITICAL-3: WebSocket /ws/chat — auth middleware gap (blue team flagged)
HIGH-1: GET /api/users/:id — IDOR candidate, auth present but no ownership check
HIGH-2: POST /api/search — raw SQL query (blue team flagged as consistency gap)
HIGH-3: PUT /api/settings — admin endpoint, middleware inconsistently applied
MEDIUM-1: GET /api/export — CSV generation with user-controlled column names
LOW-1: GET /api/health — verbose error messages expose internal paths
[Phase 3 — Deep Investigation]
Spawning focused red-teamer for CRITICAL-1 (OAuth callback)...
Finding: OAuth state parameter not validated — CSRF on auth callback
allows attacker to link victim's account to attacker's OAuth identity.
Severity: CRITICAL
Defensive gap: Blue team identified state generation without validation.
Spawning focused red-teamer for CRITICAL-2 (file upload)...
Finding: Path traversal in upload destination. Filename from multipart
form used directly in path.join() — ../../etc/cron.d/backdoor writes
to arbitrary location.
Severity: CRITICAL
Spawning focused red-teamer for CRITICAL-3 (WebSocket)...
Finding: Dead end. WebSocket handler does check auth via upgrade
headers. Blue team's middleware gap finding was about a different
middleware layer. No finding.
Spawning focused red-teamer for HIGH-1 (IDOR)...
Finding: Confirmed. GET /api/users/:id returns full user record
including email, hashed password, and API keys for any valid user ID.
Severity: HIGH (requires authentication)
Spawning focused red-teamer for HIGH-2 (SQL injection)...
Finding: Confirmed. Search parameter reaches raw SQL via template
string. POST /api/search with body {"q": "' UNION SELECT * FROM
users--"} dumps user table.
Severity: CRITICAL (upgraded from HIGH — unauthenticated endpoint)
Defensive gap: Blue team identified 3 raw SQL queries bypassing ORM.
Spawning focused red-teamer for HIGH-3 (admin settings)...
Finding: PUT /api/settings/theme has admin middleware. PUT
/api/settings/notifications does not. Regular user can modify
notification settings for all users.
Severity: HIGH
Spawning focused red-teamer for MEDIUM-1 (CSV export)...
Finding: Column name parameter reflected in CSV output without
escaping. Formula injection possible — =CMD() in column name
executes when opened in Excel.
Severity: MEDIUM
Spawning focused red-teamer for LOW-1 (health endpoint)...
Finding: Confirmed. Stack traces in error responses expose internal
file paths and dependency versions. Information disclosure only.
Severity: LOW
[Phase 4 — Chain Analysis]
Analyzing 5 findings for chains...
Chain found: IDOR (HIGH-1) + OAuth CSRF (CRITICAL-1)
→ Attacker reads victim's email via IDOR, initiates OAuth link for
that email, sends CSRF callback to victim. Result: attacker gains
OAuth access to victim's account without knowing their password.
Spawning chain investigator...
Chain confirmed. Full exploitation path validated.
Combined severity: CRITICAL
No further chains discovered. Audit converging.
## Security Audit Summary
Scope: entire codebase (excluding vendor/, testdata/)
Defense evaluation: 8 controls inventoried, 5 gaps found
Attack surface: 14 entry points
Vectors investigated: 8 of 8 targets
Findings: 8 (3 critical, 2 high, 1 medium, 1 low)
Exploit chains: 1
[Detailed findings presented to user...]
Would you like to route these findings to agents for remediation?
> Yes, let's fix the criticals
[Routing CRITICAL findings to appropriate SMEs...]