Verify PoC reproducibility before submitting - re-runs curl commands, checks responses match claims, ensures deterministic proof exists
From greyhatccnpx claudepluginhub overtimepog/greyhatcc --plugin greyhatccThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/greyhatcc:proof <finding_id or report_file>
{{ARGUMENTS}} is parsed automatically:
No format specification needed — detect and proceed.
Verifies that a finding's Proof of Concept actually works RIGHT NOW. The #1 reason reports get marked N/A is "not reproducible." This skill prevents that.
Before executing this skill:
.greyhatcc/scope.json — verify target is in scope, note exclusions.greyhatcc/hunt-state.json — check active phase, resume contextfindings_log.md, tested.json, gadgets.json — avoid duplicating workParse the finding/report for all curl commands, scripts, and reproduction steps. Look for:
curl commands in code blocksFor each extracted curl command:
| Report Claims | Validation |
|---|---|
| "Returns 200" | Verify status code is 200 |
| "CORS header reflects origin" | Check Access-Control-Allow-Origin matches |
| "Leaks user data" | Verify PII/sensitive data in response body |
| "Actuator exposed" | Verify /actuator returns health/beans/env data |
| "GraphQL introspection enabled" | Verify __schema query returns schema |
| "JWT accepted without signature" | Verify modified token gets 200, not 401 |
| "IDOR returns other user's data" | Verify different user ID returns different user's data |
After re-running, save fresh evidence:
evidence/<finding_id>/proof_rerun_<date>.txt — full request/response## Proof Validation: <finding_id>
### Commands Tested: <N>
### Results:
| # | Command | Expected | Actual | Match |
|---|---------|----------|--------|-------|
| 1 | curl -sk https://... | 200 + CORS header | 200 + CORS header | PASS |
| 2 | curl -sk https://... | 200 + user data | 403 Forbidden | FAIL |
### Verdict: [CONFIRMED / STALE / PARTIAL / FAILED]
- CONFIRMED: All PoC commands reproduce as documented
- STALE: Finding no longer reproduces (target may have been patched)
- PARTIAL: Some steps work, others don't (update the report)
- FAILED: PoC doesn't work at all (remove the finding)
### Recommendations:
- [specific actions based on results]
/greyhatcc:validate as part of multi-gate validationParse the finding/report file and extract all executable commands:
Extraction patterns:
1. Code blocks containing `curl` commands
2. Code blocks containing `python`, `python3`, `node` commands
3. Inline bash commands in numbered steps
4. Referenced PoC script files in evidence/<finding_id>/
5. HTML PoC files that need to be served and browsed
6. Multi-step sequences (numbered steps with dependencies)
Before re-running any command:
Load required headers from scope.json rules.requiredHeaders:
# Always add program-required headers
-H "X-HackerOne-Research: overtimedev"
Check authentication state: If commands require auth tokens/cookies:
Set execution timeout: Default 30 seconds per command
For each extracted command:
1. Parse the command to identify:
- HTTP method and URL
- Required headers (add program headers if missing)
- Authentication tokens (check validity)
- Expected response indicators (status code, headers, body patterns)
2. Execute with timeout:
- Wrap in timeout: `timeout 30 <command>`
- Capture full output: stdout + stderr
- Record execution timestamp (UTC)
3. Compare actual vs expected:
- Status code match? (200 vs 200 = PASS)
- Critical headers present? (CORS header, Set-Cookie, etc.)
- Response body contains expected data? (PII, tokens, error signatures)
- Response body does NOT contain block indicators? (WAF block, 403, captcha)
4. Record result:
- PASS: Actual matches expected
- FAIL: Actual differs from expected
- PARTIAL: Some assertions pass, others fail
- ERROR: Command failed to execute (timeout, DNS failure, connection refused)
- BLOCKED: WAF/rate limit prevented execution
If PoC needs valid session:
1. Check if test credentials are in scope.json rules.testAccounts
2. If credentials provided: re-authenticate, get fresh token
3. If no credentials: prompt user for auth token
4. Substitute fresh token into all commands
5. Note: "Re-validated with fresh auth token at <timestamp>"
Race condition PoCs may not reproduce consistently:
1. Run the race test 5 times
2. Record success/failure for each attempt
3. If 2+ out of 5 succeed: CONFIRMED (with note about intermittent nature)
4. If 0 out of 5 succeed: May be STALE or timing-dependent
5. Note: "Race condition validated X/5 attempts at <timestamp>"
6. For HTTP/2 single-packet attacks: ensure HTTP/2 is used
For chained vulnerability PoCs:
1. Execute each step sequentially
2. Capture output from step N to feed into step N+1
3. If any step fails, mark the chain as BROKEN at that step
4. Record which links in the chain still work independently
5. Note: "Chain validated through step X of Y"
For PoCs that require browser interaction:
1. Use Playwright MCP browser_navigate to load the PoC page
2. Use browser_evaluate to check for expected DOM state
3. Use browser_take_screenshot for visual evidence
4. Use browser_console_messages to capture JS errors/output
5. Note: "Browser-based PoC validated via Playwright at <timestamp>"
The PoC must work NOW, not just when it was originally found.
Freshness checks:
1. When was this finding first discovered? (from findings_log.md date)
2. How long ago was it?
- < 24 hours: likely still valid
- 1-7 days: should re-validate
- > 7 days: MUST re-validate before submission
- > 30 days: HIGH risk of being patched
3. Has the target deployed changes since discovery?
- Check Last-Modified, ETag, or version headers
- Compare response signatures with original evidence
4. If re-run fails: finding may be STALE — update status in findings_log.md
Based on validation result:
| Verdict | Action |
|---|---|
| CONFIRMED | Proceed with report. Save fresh evidence to evidence/<finding_id>/proof_rerun_<date>.txt |
| STALE | Update findings_log.md status to "Patched?". Do NOT submit report. Remove from pending findings. |
| PARTIAL | Update report to reflect current state. Some steps may need adjustment. Re-validate after fixes. |
| FAILED | Mark finding as invalid in findings_log.md. Remove from gadgets.json active chains. Do NOT submit. |
| BLOCKED | Try alternative technique (different IP, Playwright, encoding bypass). If still blocked, note in report. |
webapp-tester-low (haiku)When delegating to agents via Task(), ALWAYS:
After completing this skill:
tested.json — record what was tested (asset + vuln class)gadgets.json — add any informational findings with provides/requires tags for chainingfindings_log.md — log any confirmed findings with severity