From prodsec-skills
Finds similar vulnerabilities and bugs across codebases using pattern-based analysis. Guides iterative generalization from known bugs to broader patterns with CodeQL/Semgrep.
How this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:variant-analysisThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.
You are a variant analysis expert. Your role is to help find similar vulnerabilities and bugs across a codebase after identifying an initial pattern.
Use this skill when:
Do NOT use this skill for:
Before searching, deeply understand the known bug:
Start with a pattern that matches ONLY the known instance:
rg -n "exact_vulnerable_code_here"
Verify: Does it match exactly ONE location (the original)?
| Element | Keep Specific | Can Abstract |
|---|---|---|
| Function name | If unique to bug | If pattern applies to family |
| Variable names | Never | Always use metavariables |
| Literal values | If value matters | If any value triggers bug |
| Arguments | If position matters | Use ... wildcards |
Change ONE element at a time:
Stop when false positive rate exceeds ~50%
For each match, document:
For deeper strategic guidance, see ## Inlined: Variant Analysis Methodology at the end of this document.
| Scenario | Tool | Why |
|---|---|---|
| Quick surface search | ripgrep | Fast, zero setup |
| Simple pattern matching | Semgrep | Easy syntax, no build needed |
| Data flow tracking | Semgrep taint / CodeQL | Follows values across functions |
| Cross-function analysis | CodeQL | Best interprocedural analysis |
| Non-building code | Semgrep | Works on incomplete code |
These common mistakes cause analysts to miss real vulnerabilities:
Searching only the module where the original bug was found misses variants in other locations.
Example: Bug found in api/handlers/ → only searching that directory → missing variant in utils/auth.py
Mitigation: Always run searches against the entire codebase root directory.
Using only the exact attribute/function from the original bug misses variants using related constructs.
Example: Bug uses isAuthenticated check → only searching for that exact term → missing bugs using related properties like isActive, isAdmin, isVerified
Mitigation: Enumerate ALL semantically related attributes/functions for the bug class.
Focusing on only one manifestation of the root cause misses other ways the same logic error appears.
Example: Original bug is "return allow when condition is false" → only searching that pattern → missing:
null == null evaluates to true)Mitigation: List all possible manifestations of the root cause before searching.
Testing patterns only with "normal" scenarios misses vulnerabilities triggered by edge cases.
Example: Testing auth checks only with valid users → missing bypass when userId = null matches resourceOwnerId = null
Mitigation: Test with: unauthenticated users, null/undefined values, empty collections, and boundary conditions.
Ready-to-use CodeQL (.ql) and Semgrep (.yaml) rule templates and a variant report template ship with the upstream plugin.
(See upstream Trail of Bits prodsec-skills variant-analysis plugin under resources/codeql/ and resources/semgrep/ for those files.)
Use ## Inlined: Variant Report Template at the end of this document for report structure.
This document covers the strategic thinking behind effective variant analysis.
Vulnerabilities cluster because developers make consistent mistakes:
Understanding WHY variants exist helps predict WHERE to find them.
Before searching, extract the essential vulnerability pattern:
eval(), system(), raw SQL)Formulate a clear statement:
"This vulnerability exists because [UNTRUSTED DATA] reaches [DANGEROUS OPERATION] without [REQUIRED PROTECTION]."
Examples:
eval() without sanitization"malloc() without overflow check"open() without canonicalization"This statement IS your search pattern.
Patterns exist at different abstraction levels. Start at Level 0 and climb.
Match the literal vulnerable code:
# Original vulnerable code
query = "SELECT * FROM users WHERE id=" + request.args.get('id')
# Level 0 pattern
rg 'SELECT \* FROM users WHERE id=" \+ request\.args\.get'
Replace variable names with wildcards:
# Level 1 pattern
pattern: $QUERY = "SELECT * FROM users WHERE id=" + $INPUT
Generalize the structure:
# Level 2 pattern
patterns:
- pattern: $Q = "..." + $INPUT
- pattern-inside: |
def $FUNC(...):
...
cursor.execute($Q)
Abstract to the security property:
# Level 3 pattern (taint mode)
mode: taint
pattern-sources:
- pattern: request.args.get(...)
- pattern: request.form.get(...)
pattern-sinks:
- pattern: cursor.execute(...)
| Goal | Recommended Level |
|---|---|
| Verify a specific fix | Level 0 |
| Find copy-paste bugs | Level 1 |
| Audit a component | Level 2 |
| Full security assessment | Level 3 |
Never generalize multiple elements simultaneously:
BAD: exact code -> fully abstract pattern
GOOD: exact code -> abstract var1 -> abstract var2 -> abstract operation
Each step:
At each generalization step, ask:
Should I abstract this variable name?
Should I abstract this literal value?
2 in a shift operation) are dangerousShould I use ... wildcards?
Should I add taint tracking?
| Context | Acceptable FP Rate |
|---|---|
| Automated CI blocking | <5% |
| Developer warning | <20% |
| Security audit triage | <50% |
| Research/exploration | <80% |
Dead code: Add reachability constraints
pattern-not-inside: |
if False:
...
Test code: Exclude test directories
rg "pattern" --glob '!**/test*' --glob '!**/*_test.*'
Already sanitized: Add sanitizer patterns
pattern-not: dangerous_func(sanitize($X))
Literal values: Exclude non-user-controlled data
pattern-not: dangerous_func("...") # Literal string
For large-scale hunts: Recon (ripgrep to find hotspots) → Deep Analysis (Semgrep/CodeQL on hotspots) → Refinement (reduce FPs) → Automation (CI-ready rules).
Maintain a tracking document:
## Variant Analysis: [Original Bug ID]
### Root Cause
[Statement of the vulnerability pattern]
### Patterns Tried
| Pattern | Level | Matches | True Pos | False Pos | Notes |
|---------|-------|---------|----------|-----------|-------|
| exact | 0 | 1 | 1 | 0 | Baseline |
| ... | ... | ... | ... | ... | ... |
### Confirmed Variants
| Location | Severity | Status | Notes |
|----------|----------|--------|-------|
| file:line| High | Fixed | ... |
### False Positive Patterns
- Pattern X: Always FP because [reason]
- Pattern Y: FP in [context] but TP in [context]
Wrong: Jump straight to semantic analysis Right: Start with exact match, generalize incrementally
Wrong: Abstract all elements at once Right: Abstract one element, verify, repeat
Wrong: "I'll triage later" Right: Analyze FPs immediately, they guide pattern refinement
Wrong: "I only use CodeQL" Right: Use ripgrep for recon, Semgrep for iteration, CodeQL for precision
Wrong: Keep all patterns regardless of FP rate Right: Delete patterns that don't provide value
A single root cause can manifest in multiple ways. Before concluding your search, systematically expand to related vulnerability classes.
For each root cause, ask:
What other attributes/functions have similar semantics?
isAuthenticated, also check: isActive, isAdmin, isVerified, isLoggedInuserId, also check: ownerId, creatorId, authorIdWhat other boolean logic errors could occur?
if not x vs if x)return true vs return false)What edge cases exist for the data types involved?
What documentation mismatches could exist?
Some bugs can only be found by comparing code behavior to documented intent:
Pattern: Function name or docstring suggests one behavior, code does another
# Docstring says "Returns True if access should be DENIED"
# But code returns True when user HAS permission (should be allowed)
def check_restricted_permission(user, perm):
"""Returns True if access should be DENIED."""
if user.has_perm(perm):
return True # BUG: This grants access to users with permission
return False
Detection strategy:
A common class of authorization bypass:
# If anonymous_user.id is None and guest_order.owner_id is None
# Then None == None evaluates to True, bypassing the check
if order.owner_id == current_user.id:
return True # Allows access
Detection strategy:
| Field | Value |
|---|---|
| Original Bug | [BUG_ID / CVE] |
| Analysis Date | [DATE] |
| Codebase | [REPO/PROJECT] |
| Variants Found | [COUNT] |
Root Cause: [e.g., "User input reaches SQL query without parameterization"]
Location: [path/to/file.py:LINE] in function_name()
# Vulnerable code
| Version | Pattern | Tool | Matches | TP | FP |
|---|---|---|---|---|---|
| v1 | [exact] | ripgrep | 1 | 1 | 0 |
| v2 | [abstract] | semgrep | N | N | N |
Final Pattern:
# Pattern used
| Severity | Confidence | Status |
|---|---|---|
| High | High | Confirmed |
Location: [path/to/file.py:LINE]
# Vulnerable code
Analysis: [Why this is a true/false positive]
Exploitability:
| Pattern | Count | Reason |
|---|---|---|
| [pattern] | N | [why safe] |
# CI-ready rule
npx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsFinds similar vulnerabilities and bugs across codebases using pattern-based analysis. Use when hunting bug variants, building CodeQL/Semgrep queries, or performing systematic code audits after finding an initial issue.
Finds similar bugs and vulnerabilities across codebases via iterative pattern generalization using ripgrep, Semgrep, and CodeQL after initial issue discovery.
Finds similar bugs and vulnerabilities across codebases via iterative pattern generalization using ripgrep, Semgrep, and CodeQL after initial issue discovery.