Help us improve
Share bugs, ideas, or general feedback.
From prodsec-skills
Runs Semgrep static analysis scans in parallel subagents with two modes: full coverage and high-confidence security vulnerabilities. Detects Semgrep Pro for cross-file taint analysis. Use for security audits and finding bugs.
npx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:semgrepThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run a Semgrep scan with automatic language detection, parallel execution across language groups, and merged SARIF output.
Runs Semgrep static analysis scan using parallel subagents. Supports full ruleset coverage or high-confidence security vulnerabilities. Use for security audits or bug finding.
Runs parallel Semgrep scans on multi-language codebases with full or security-focused modes. Auto-uses Pro for taint analysis; merges SARIF output.
Runs Semgrep for static analysis, security scanning, and pattern matching. Scans code, writes custom YAML rules, detects vulnerabilities, uses taint mode, sets up CI/CD pipelines.
Share bugs, ideas, or general feedback.
Run a Semgrep scan with automatic language detection, parallel execution across language groups, and merged SARIF output.
--metrics=off — Semgrep sends telemetry by default; --config auto also phones home. Every semgrep command must include --metrics=off to prevent data leakage during security audits.semgrep-rule-creator skillsemgrep-rule-variant-creator skillAll scan results, SARIF files, and temporary data are stored in a single output directory.
OUTPUT_DIR../static_analysis_semgrep_1. If that already exists, increment to _2, _3, etc.In both cases, always create the directory with mkdir -p before writing any files.
# Resolve output directory
if [ -n "$USER_SPECIFIED_DIR" ]; then
OUTPUT_DIR="$USER_SPECIFIED_DIR"
else
BASE="static_analysis_semgrep"
N=1
while [ -e "${BASE}_${N}" ]; do
N=$((N + 1))
done
OUTPUT_DIR="${BASE}_${N}"
fi
mkdir -p "$OUTPUT_DIR/raw" "$OUTPUT_DIR/results"
The output directory is resolved once at the start of Step 1 and used throughout all subsequent steps.
$OUTPUT_DIR/
├── rulesets.txt # Approved rulesets (logged after Step 3)
├── raw/ # Per-scan raw output (unfiltered)
│ ├── python-python.json
│ ├── python-python.sarif
│ ├── python-django.json
│ ├── python-django.sarif
│ └── ...
└── results/ # Final merged output
└── results.sarif
Required: Semgrep CLI (semgrep --version). If not installed, see Semgrep installation docs.
Optional: Semgrep Pro — enables cross-file taint tracking, inter-procedural analysis, and additional languages (Apex, C#, Elixir). Check with:
semgrep --pro --validate --config p/default 2>/dev/null && echo "Pro available" || echo "OSS only"
Limitations: OSS mode cannot track data flow across files. Pro mode uses -j 1 for cross-file analysis (slower per ruleset, but parallel rulesets compensate).
Select mode in Step 2 of the workflow. Mode affects both scanner flags and post-processing.
| Mode | Coverage | Findings Reported |
|---|---|---|
| Run all | All rulesets, all severity levels | Everything |
| Important only | All rulesets, pre- and post-filtered | Security vulns only, medium-high confidence/impact |
Important only applies two filter layers:
--severity MEDIUM --severity HIGH --severity CRITICAL (CLI flag)category=security, confidence∈{MEDIUM,HIGH}, impact∈{MEDIUM,HIGH}Details and jq commands are in Inlined: scan modes below.
┌──────────────────────────────────────────────────────────────────┐
│ Coordinator (this skill) │
│ Step 1: Detect languages + check Pro availability │
│ Step 2: Select scan mode + rulesets │
│ Step 3: Present plan + rulesets, get approval [⛔ HARD GATE] │
│ Step 4: Run parallel scans (approved rulesets + mode) │
│ Step 5: Merge results and report │
└──────────────────────────────────────────────────────────────────┘
│ Step 4
▼
┌─────────────────┐
│ Scan workers │
│ (parallel) │
├─────────────────┤
│ Python scanner │
│ JS/TS scanner │
│ Go scanner │
│ Docker scanner │
└─────────────────┘
Follow the five steps below (expanded from upstream scan-workflow.md). Track them as a checklist; Step 3 is mandatory approval before any scan.
| Step | Action | Gate | Key reference |
|---|---|---|---|
| 1 | Resolve output dir, detect languages + Pro availability | — | Glob/file patterns below |
| 2 | Select scan mode + rulesets | — | Inlined: rulesets |
| 3 | Present plan, get explicit approval | ⛔ HARD | Ask the user |
| 4 | Run parallel scans | — | Inlined: scanner task prompt |
| 5 | Merge results and report | — | Merge SARIF (below) |
Merge SARIF (Step 5, no upstream script required): After optional important-only JSON post-filter (see inlined scan-modes), merge per-run SARIF files:
jq -s '{"version": "2.1.0", "$schema": "https://json.schemastore.org/sarif-2.1.0.json", "runs": [.[].runs[]]}' "$OUTPUT_DIR/raw"/*.sarif > "$OUTPUT_DIR/results/results.sarif"
Adjust the glob if you need a subset. The upstream repo also ships a Python merge helper; (see upstream Trail of Bits prodsec-skills for companion files).
Verify merged output parses as JSON and report counts by severity/category.
| Shortcut | Why It's Wrong |
|---|---|
| "User asked for scan, that's approval" | Original request ≠ plan approval. Present plan, await explicit "yes" |
| "Step 3 task is blocking, just mark complete" | Do not skip the real approval gate |
| "I already know what they want" | Assumptions cause scanning wrong directories/rulesets. Present plan for verification |
| "Just use default rulesets" | User must see and approve exact rulesets before scan |
| "Add extra rulesets without asking" | Modifying approved list without consent breaks trust |
| "Third-party rulesets are optional" | Trail of Bits, 0xdea, Decurity catch vulnerabilities not in official registry — REQUIRED |
| "Use --config auto" | Sends metrics; less control over rulesets |
| "One scan at a time" | Defeats parallelism; run independent ruleset scans in parallel where safe |
| "Pro is too slow, skip --pro" | Cross-file analysis catches many more true positives; worth the time |
| "Semgrep handles GitHub URLs natively" | URL handling fails on repos with non-standard YAML; always clone first |
| "Cleanup is optional" | Cloned repos pollute the user's workspace and accumulate across runs |
"Use . or relative path as target" | Prefer absolute paths for workers to avoid ambiguity |
| "Let the user pick an output dir later" | Output directory must be resolved at Step 1, before any files are created |
| Topic | Location |
|---|---|
| Ruleset catalog + selection | Inlined: rulesets below |
| Important-only filters | Inlined: scan modes below |
| Worker instructions | Inlined: scanner task prompt below |
| Full step-by-step narrative | (see upstream Trail of Bits prodsec-skills for companion files) — workflows/scan-workflow.md |
$OUTPUT_DIRsemgrep command used --metrics=off$OUTPUT_DIR/rulesets.txt$OUTPUT_DIR/raw/results.sarif exists in $OUTPUT_DIR/results/ and is valid JSONraw/$OUTPUT_DIR/repos/references/scan-modes.md)Full scan with all rulesets and severity levels. Current default behavior. No filtering applied — all findings are reported and triaged.
Focused on high-confidence security vulnerabilities. Excludes code quality, best practices, and low-confidence audit findings.
Add these flags to every semgrep command:
--severity MEDIUM --severity HIGH --severity CRITICAL
This excludes LOW/INFO severity findings at scan time, reducing output volume before post-filtering.
After scanning, filter each JSON result file to keep only findings matching ALL of:
| Metadata Field | Accepted Values | Rationale |
|---|---|---|
extra.metadata.category | "security" | Excludes correctness, best-practice, maintainability, performance |
extra.metadata.confidence | "MEDIUM", "HIGH" | Excludes low-precision rules (high false positive rate) |
extra.metadata.impact | "MEDIUM", "HIGH" | Excludes low-impact informational findings |
Third-party rules (Trail of Bits, 0xdea, Decurity, etc.) may not have confidence/impact/category metadata. Findings without these metadata fields are kept — we cannot filter what is not annotated, and third-party rules are typically security-focused.
Semgrep security rules have these metadata fields (required for category: security in the official registry):
| Field | Purpose | Values |
|---|---|---|
severity (top-level) | Overall rule severity, derived from likelihood × impact | LOW, MEDIUM, HIGH, CRITICAL |
category | Rule category | security, correctness, best-practice, maintainability, performance |
confidence | True positive rate of the rule (precision) | LOW, MEDIUM, HIGH |
impact | Potential damage if vulnerability is exploited | LOW, MEDIUM, HIGH |
likelihood | How likely the vulnerability is exploitable | LOW, MEDIUM, HIGH |
subcategory | Finding type | vuln, audit, secure default |
Key relationship: severity = f(likelihood, impact) while confidence is independent (describes rule quality, not vulnerability severity).
Apply to each JSON result file after scanning:
# Filter a single result file
jq '{
results: [.results[] |
((.extra.metadata.category // "security") | ascii_downcase) as $cat |
((.extra.metadata.confidence // "HIGH") | ascii_upcase) as $conf |
((.extra.metadata.impact // "HIGH") | ascii_upcase) as $imp |
select(
($cat == "security") and
($conf == "MEDIUM" or $conf == "HIGH") and
($imp == "MEDIUM" or $imp == "HIGH")
)
],
errors: .errors,
paths: .paths
}' "$f" > "${f%.json}-important.json"
Default values (// "security", // "HIGH") handle third-party rules without metadata — they pass all filters by default.
Raw scan output lives in $OUTPUT_DIR/raw/. The filter creates *-important.json files alongside the originals — the raw files are preserved unmodified.
# Apply important-only filter to all scan result JSON files in raw/
for f in "$OUTPUT_DIR/raw"/*-*.json; do
[[ "$f" == *-triage.json || "$f" == *-important.json ]] && continue
jq '{
results: [.results[] |
((.extra.metadata.category // "security") | ascii_downcase) as $cat |
((.extra.metadata.confidence // "HIGH") | ascii_upcase) as $conf |
((.extra.metadata.impact // "HIGH") | ascii_upcase) as $imp |
select(
($cat == "security") and
($conf == "MEDIUM" or $conf == "HIGH") and
($imp == "MEDIUM" or $imp == "HIGH")
)
],
errors: .errors,
paths: .paths
}' "$f" > "${f%.json}-important.json"
BEFORE=$(jq '.results | length' "$f")
AFTER=$(jq '.results | length' "${f%.json}-important.json")
echo "$f: $BEFORE → $AFTER findings (filtered $(( BEFORE - AFTER )))"
done
In important-only mode, add [SEVERITY_FLAGS] to each scanner command:
semgrep [--pro if available] --metrics=off [SEVERITY_FLAGS] --config [RULESET] --json -o [OUTPUT_DIR]/raw/[lang]-[ruleset].json --sarif-output=[OUTPUT_DIR]/raw/[lang]-[ruleset].sarif [TARGET] &
Where [SEVERITY_FLAGS] is:
--severity MEDIUM --severity HIGH --severity CRITICALreferences/scanner-task-prompt.md)Use when delegating per-language Semgrep runs (replace bracketed placeholders).
You are a Semgrep scanner for [LANGUAGE_CATEGORY].
## Task
Run Semgrep scans for [LANGUAGE] files and save results to [OUTPUT_DIR]/raw.
## Pro Engine Status: [PRO_AVAILABLE: true/false]
## Scan Mode: [SCAN_MODE: run-all/important-only]
## APPROVED RULESETS (from user-confirmed plan)
[LIST EXACT RULESETS USER APPROVED - DO NOT SUBSTITUTE]
Example:
- p/python
- p/django
- p/security-audit
- p/secrets
- https://github.com/trailofbits/semgrep-rules
## Commands to Run (in parallel)
### Clone GitHub URL rulesets first:
```bash
mkdir -p [OUTPUT_DIR]/repos
# For each GitHub URL ruleset, clone into [OUTPUT_DIR]/repos/[name]:
git clone --depth 1 https://github.com/org/repo [OUTPUT_DIR]/repos/repo-name
semgrep [--pro if available] --metrics=off [SEVERITY_FLAGS] [INCLUDE_FLAGS] --config [RULESET] --json -o [OUTPUT_DIR]/raw/[lang]-[ruleset].json --sarif-output=[OUTPUT_DIR]/raw/[lang]-[ruleset].sarif [TARGET] &
Wait for all to complete:
wait
[ -n "[OUTPUT_DIR]" ] && rm -rf [OUTPUT_DIR]/repos
--severity MEDIUM --severity HIGH --severity CRITICAL to every command--include flags for language-specific rulesets (e.g., --include="*.py" for p/python). Do NOT add --include to cross-language rulesets like p/security-audit, p/secrets, or third-party reposReport:
## Variable Substitutions
| Variable | Description | Example |
|----------|-------------|---------|
| `[LANGUAGE_CATEGORY]` | Language group being scanned | Python, JavaScript, Docker |
| `[LANGUAGE]` | Specific language | Python, TypeScript, Go |
| `[OUTPUT_DIR]` | Output directory (absolute path, resolved in Step 1) | /path/to/static_analysis_semgrep_1 |
| `[PRO_AVAILABLE]` | Whether Pro engine is available | true, false |
| `[SEVERITY_FLAGS]` | Severity pre-filter flags | *(empty)* for run-all, `--severity MEDIUM --severity HIGH --severity CRITICAL` for important-only |
| `[INCLUDE_FLAGS]` | File extension filter for language-specific rulesets | `--include="*.py"` for Python rulesets, *(empty)* for cross-language rulesets like p/security-audit, p/secrets, or third-party repos |
| `[RULESET]` | Semgrep ruleset identifier or local clone path | p/python, [OUTPUT_DIR]/repos/semgrep-rules |
| `[TARGET]` | Absolute path to directory to scan | /path/to/codebase |
---
## Inlined: rulesets (upstream `references/rulesets.md`)
# Semgrep Rulesets Reference
## Complete Ruleset Catalog
### Security-Focused Rulesets
| Ruleset | Description | Use Case |
|---------|-------------|----------|
| `p/security-audit` | Comprehensive vulnerability detection, higher false positives | Manual audits, security reviews |
| `p/secrets` | Hardcoded credentials, API keys, tokens | Always include |
| `p/owasp-top-ten` | OWASP Top 10 web application vulnerabilities | Web app security |
| `p/cwe-top-25` | CWE Top 25 most dangerous software weaknesses | General security |
| `p/sql-injection` | SQL injection patterns and tainted data flows | Database security |
| `p/insecure-transport` | Ensures code uses encrypted channels | Network security |
| `p/gitleaks` | Hard-coded credentials detection (gitleaks port) | Secrets scanning |
| `p/findsecbugs` | FindSecBugs rule pack for Java | Java security |
| `p/phpcs-security-audit` | PHP security audit rules | PHP security |
### CI/CD Rulesets
| Ruleset | Description | Use Case |
|---------|-------------|----------|
| `p/default` | Default ruleset, balanced coverage | First-time users |
| `p/ci` | High-confidence security + logic bugs, low FP | CI pipelines |
| `p/r2c-ci` | Low false positives, CI-safe | CI/CD blocking |
| `p/r2c` | Community favorite, curated by Semgrep (618k+ downloads) | General scanning |
| `p/auto` | Auto-selects rules based on detected languages/frameworks | Quick scans |
| `p/comment` | Comment-related rules | Code review |
### Third-Party Rulesets
| Ruleset | Description | Maintainer |
|---------|-------------|------------|
| `p/gitlab` | GitLab-maintained security rules | GitLab |
---
## Ruleset Selection Algorithm
Follow this algorithm to select rulesets based on detected languages and frameworks.
### Step 1: Always Include Security Baseline
```json
{
"baseline": ["p/security-audit", "p/secrets"]
}
p/security-audit - Comprehensive vulnerability detection (always include)p/secrets - Hardcoded credentials, API keys, tokens (always include)For each detected language, add the primary ruleset. If a framework is detected, add its ruleset too.
GA Languages (production-ready):
| Detection | Primary Ruleset | Framework Rulesets | Pro Rule Count |
|---|---|---|---|
.py | p/python | p/django, p/flask, p/fastapi | 710+ |
.js, .jsx | p/javascript | p/react, p/nodejs, p/express, p/nextjs, p/angular | 250+ (JS), 70+ (JSX) |
.ts, .tsx | p/typescript | p/react, p/nodejs, p/express, p/nextjs, p/angular | 230+ |
.go | p/golang | p/go (alias) | 80+ |
.java | p/java | p/spring, p/findsecbugs | 190+ |
.kt | p/kotlin | p/spring | 60+ |
.rb | p/ruby | p/rails | 40+ |
.php | p/php | p/symfony, p/laravel, p/phpcs-security-audit | 50+ |
.c, .cpp, .h | p/c | - | 150+ |
.rs | p/rust | - | 40+ |
.cs | p/csharp | - | 170+ |
.scala | p/scala | - | Community |
.swift | p/swift | - | 60+ |
Beta Languages (Pro recommended):
| Detection | Primary Ruleset | Notes |
|---|---|---|
.ex, .exs | p/elixir | Requires Pro for best coverage |
.cls, .trigger | p/apex | Salesforce; requires Pro |
Experimental Languages:
| Detection | Primary Ruleset | Notes |
|---|---|---|
.sol | No official ruleset | Use Decurity third-party rules |
Dockerfile | p/dockerfile | Limited rules |
.yaml, .yml | p/yaml | K8s, GitHub Actions, docker-compose patterns |
.json | r/json.aws | AWS IAM policies; use r/json.* for specific rules |
| Bash scripts | - | Community support |
| Cairo, Circom | - | Experimental, smart contracts |
Framework detection hints:
| Framework | Detection Signals | Ruleset |
|---|---|---|
| Django | settings.py, urls.py, django in requirements | p/django |
| Flask | flask in requirements, @app.route | p/flask |
| FastAPI | fastapi in requirements, @app.get/post | p/fastapi |
| React | package.json with react dependency, .jsx/.tsx files | p/react |
| Next.js | next.config.js, pages/ or app/ directory | p/nextjs |
| Angular | angular.json, @angular/ dependencies | p/angular |
| Express | express in package.json, app.use() patterns | p/express |
| NestJS | @nestjs/ dependencies, @Controller decorators | p/nodejs |
| Spring | pom.xml with spring, @SpringBootApplication | p/spring |
| Rails | Gemfile with rails, config/routes.rb | p/rails |
| Laravel | composer.json with laravel, artisan | p/laravel |
| Symfony | composer.json with symfony, config/packages/ | p/symfony |
| Detection | Ruleset | Description |
|---|---|---|
Dockerfile | p/dockerfile | Container security, best practices |
.tf, .hcl | p/terraform | IaC misconfigurations, CIS benchmarks, AWS/Azure/GCP |
| k8s manifests | p/kubernetes | K8s security, RBAC issues |
| CloudFormation | p/cloudformation | AWS infrastructure security |
| GitHub Actions | p/github-actions | CI/CD security, secrets exposure |
.yaml, .yml | p/yaml | Generic YAML patterns (K8s, docker-compose) |
| AWS IAM JSON | r/json.aws | IAM policy misconfigurations (use --config r/json.aws) |
These are NOT optional. Include automatically when language matches:
| Languages | Source | Why Required |
|---|---|---|
| Python, Go, Ruby, JS/TS, Terraform, HCL | Trail of Bits | Security audit patterns from real engagements (AGPLv3) |
| C, C++ | 0xdea | Memory safety, low-level vulnerabilities |
| Solidity, Cairo, Rust | Decurity | Smart contract vulnerabilities, DeFi exploits |
| Go | dgryski | Additional Go-specific patterns |
| Android (Java/Kotlin) | MindedSecurity | OWASP MASTG-derived mobile security rules |
| Java, Go, JS/TS, C#, Python, PHP | elttam | Security consulting patterns |
| Dockerfile, PHP, Go, Java | kondukto | Container and web app security |
| PHP, Kotlin, Java | dotta | Pentest-derived web/mobile app rules |
| Terraform, HCL | HashiCorp | HashiCorp infrastructure patterns |
| Swift, Java, Cobol | akabe1 | iOS and legacy system patterns |
| Java | Atlassian Labs | Atlassian-maintained Java rules |
| Python, JS/TS, Java, Ruby, Go, PHP | Apiiro | Malicious code detection, supply chain |
Before finalizing, verify official rulesets load:
# Quick validation (exits 0 if valid)
semgrep --config p/python --validate --metrics=off 2>&1 | head -3
Or browse the Semgrep Registry.
{
"baseline": ["p/security-audit", "p/secrets"],
"python": ["p/python", "p/django"],
"javascript": ["p/javascript", "p/react", "p/nodejs"],
"docker": ["p/dockerfile"],
"third_party": ["https://github.com/trailofbits/semgrep-rules"]
}