Create custom Semgrep rules for detecting bug patterns and security vulnerabilities. This skill should be used when the user explicitly asks to "create a Semgrep rule", "write a Semgrep rule", "make a Semgrep rule", "build a Semgrep rule", or requests detection of a specific bug pattern, vulnerability, or insecure code pattern using Semgrep.
Creates custom Semgrep rules to detect security vulnerabilities and bug patterns.
/plugin marketplace add trailofbits/skills/plugin install semgrep-rule-creator@trailofbitsThis skill is limited to using the following tools:
references/quick-reference.mdreferences/workflow.mdCreate production-quality Semgrep rules with proper testing and validation.
Ideal scenarios:
Do NOT use this skill for:
semgrep skill instead)static-analysis plugin)When creating Semgrep rules, reject these common shortcuts:
semgrep --test --config rule.yaml test-file to verify. Untested rules have hidden false positives/negatives.Too broad - matches everything, useless for detection:
# BAD: Matches any function call
pattern: $FUNC(...)
# GOOD: Specific dangerous function
pattern: eval(...)
Missing safe cases in tests - leads to undetected false positives:
# BAD: Only tests vulnerable case
# ruleid: my-rule
dangerous(user_input)
# GOOD: Include safe cases to verify no false positives
# ruleid: my-rule
dangerous(user_input)
# ok: my-rule
dangerous(sanitize(user_input))
# ok: my-rule
dangerous("hardcoded_safe_value")
Overly specific patterns - misses variations:
# BAD: Only matches exact format
pattern: os.system("rm " + $VAR)
# GOOD: Matches all os.system calls with taint tracking
mode: taint
pattern-sinks:
- pattern: os.system(...)
This workflow is strict - do not skip steps:
This skill guides creation of Semgrep rules that detect security vulnerabilities and bug patterns. Rules are created iteratively: write test cases first, analyze AST structure, write the rule, then iterate until all tests pass.
Approach selection:
Why prioritize taint mode? Pattern matching finds syntax but misses context. A pattern eval($X) matches both eval(user_input) (vulnerable) and eval("safe_literal") (safe). Taint mode tracks data flow, so it only alerts when untrusted data actually reaches the sink—dramatically reducing false positives for injection vulnerabilities.
Iterating between approaches: It's okay to experiment. If you start with taint mode and it's not working well (e.g., taint doesn't propagate as expected, too many false positives/negatives), switch to pattern matching. Conversely, if pattern matching produces too many false positives on safe code, try taint mode instead. The goal is a working rule—not rigid adherence to one approach.
Output structure - exactly two files in a directory named after the rule ID:
<rule-id>/
├── <rule-id>.yaml # Semgrep rule
└── <rule-id>.<ext> # Test file with ruleid/ok annotations
rules:
- id: insecure-eval
languages: [python]
severity: ERROR
message: User input passed to eval() allows code execution
mode: taint
pattern-sources:
- pattern: request.args.get(...)
pattern-sinks:
- pattern: eval(...)
Test file (insecure-eval.py):
# ruleid: insecure-eval
eval(request.args.get('code'))
# ok: insecure-eval
eval("print('safe')")
Run tests: semgrep --test --config rule.yaml test-file
| Task | Command |
|---|---|
| Run tests | semgrep --test --config rule.yaml test-file |
| Validate YAML | semgrep --validate --config rule.yaml |
| Dump AST | semgrep --dump-ast -l <lang> <file> |
| Debug taint flow | semgrep --dataflow-traces -f rule.yaml file |
| Run single rule | semgrep -f rule.yaml <file> |
| Pattern Operator | Purpose |
|---|---|
pattern | Match single pattern |
patterns | AND - all must match |
pattern-either | OR - any can match |
pattern-not | Exclude matches |
pattern-inside | Must be inside scope |
metavariable-regex | Filter by regex |
focus-metavariable | Report on specific part |
| Taint Component | Purpose |
|---|---|
pattern-sources | Where tainted data originates |
pattern-sinks | Dangerous functions receiving taint |
pattern-sanitizers | Functions that clean taint |
pattern-propagators | Custom taint propagation |
Understand the bug pattern, identify target language, determine if taint mode applies.
Before writing complex rules, see Documentation for required reading.
Why test-first? Writing tests before the rule forces you to think about both vulnerable AND safe patterns. Rules written without tests often have hidden false positives (matching safe code) or false negatives (missing vulnerable variants). Tests make these visible immediately.
Create directory and test file with annotations:
// ruleid: <id> - Line BEFORE code that SHOULD match// ok: <id> - Line BEFORE code that should NOT matchWhy analyze AST? Semgrep matches against the Abstract Syntax Tree, not raw text. Code that looks similar may parse differently (e.g., foo.bar() vs foo().bar). The AST dump shows exactly what Semgrep sees, preventing patterns that fail due to unexpected tree structure.
semgrep --dump-ast -l <language> <test-file>
See workflow.md for detailed patterns and examples.
semgrep --test --config rule.yaml test-file
Verification checkpoint: Output MUST show ✓ All tests passed. Do not proceed to optimization until this is achieved.
For debugging taint rules:
semgrep --dataflow-traces -f rule.yaml test-file
After all tests pass, analyze the rule for redundant or unnecessary patterns:
Common optimizations:
" and ' as equivalent - remove duplicate patternsfunc(...) already matches func() - remove the more specific onefunc($X, ...) covers func($X) - keep only the general formExample - Before optimization:
pattern-either:
- pattern: hashlib.md5(...)
- pattern: md5(...)
- pattern: hashlib.new("md5", ...)
- pattern: hashlib.new('md5', ...) # Redundant - quotes equivalent
- pattern: hashlib.new("md5") # Redundant - covered by ... variant
- pattern: hashlib.new('md5') # Redundant - quotes + covered
After optimization:
pattern-either:
- pattern: hashlib.md5(...)
- pattern: md5(...)
- pattern: hashlib.new("md5", ...) # Covers all quote/argument variants
Optimization checklist:
" vs ')...)semgrep --test --config rule.yaml test-file
Final verification: Output MUST show ✓ All tests passed after optimization. If any test fails, revert the optimization that caused it.
Task complete ONLY when: All tests pass after optimization.
ruleid: placement: Comment goes on line IMMEDIATELY BEFORE the flagged codeREQUIRED: Before creating any rule, use WebFetch to read the official Semgrep documentation:
Additional resources - fetch as needed:
Understand anti-reversing, obfuscation, and protection techniques encountered during software analysis. Use when analyzing protected binaries, bypassing anti-debugging for authorized analysis, or understanding software protection mechanisms.