From sd0x-dev-flow
Verifies deployed feature behavior read-only via API queries, log scans, metrics, and degradation-aware checks. For post-deploy smoke tests, API validation, production diagnosis.
npx claudepluginhub sd0xdev/sd0x-dev-flow --plugin sd0x-dev-flowThis skill is limited to using the following tools:
- Keywords: verify, investigate, diagnose, check if working, post-deploy, smoke test, validate
Runs parallel specialized agents to verify implementations, run tests (unit/e2e/integration/perf/LLM), grade quality (0-10 scale), and suggest improvements. Use before merging.
Verifies feature completion by writing automated tests against SPEC.md, running commands for fresh evidence, and confirming outputs per Iron Law of Verification.
Provides truth scoring (0-1 scale) for code/agents/tasks, automated verification of correctness/security/best practices, auto-rollback below 0.95 threshold, metrics dashboards, and CI/CD exports.
Share bugs, ideas, or general feedback.
| Need | Use Instead |
|---|---|
| Modify data or state | /feature-dev |
| Code quality review | /codex-review-fast |
| Generate unit tests | /codex-test-gen |
| Security audit | /codex-security |
| Run local tests | /verify |
| Review test coverage | /codex-test-review |
⚠️ ALL OPERATIONS MUST BE READ-ONLY ⚠️
Claude independent analysis → Codex third-perspective confirmation → Integrated verdict
Tool safety note:
allowed-toolsincludesBashfor curl/log queries. Read-only enforcement is behavioral — all commands MUST be reviewed againstreferences/safety-rules.mdbefore execution. Codex independently verifies compliance at P5.
Auto-detect from references/environments.md configuration:
| Level | Available Resources | P3 API | P4 Observation | Confidence Cap |
|---|---|---|---|---|
| L4 | API + Log + Metrics | Full | Log + Metrics | High |
| L3 | API + Log | Full | Log only | High |
| L2-API | API only | Full | Response-only | Medium |
| L2-OBS | Log only (API unreachable) | Skip | Time-window scan | Medium |
| L1 | No runtime access | Skip P3/P4 | Code review only | Low |
Auto-detection logic (see references/environments.md § Degradation Detection):
| API Status | Log System | Metrics | Level |
|---|---|---|---|
| Reachable | Yes | Yes | L4 |
| Reachable | Yes | No | L3 |
| Reachable | No | — | L2-API |
| Unreachable | Yes | — | L2-OBS |
| Unreachable | No | — | L1 |
Fail-closed: If Endpoint Allowlist section is missing, skip P3 (cannot call unverified endpoints). At L1, skip P3 and P4. Provide code-review-based analysis only with Low confidence. At L2-OBS, skip P3 (API unreachable); execute P4 time-window scan and background service observation only.
sequenceDiagram
participant C as Claude
participant U as User
participant API as Target API
participant Log as Log System
participant Cx as Codex
C->>C: P0: Scope & Safety
C->>C: P1: Diff-Lite Scoping
C->>U: P2: Test Charter (approve?)
U->>C: Approved
C->>API: P3: API Execute (read-only)
C->>Log: P4: Observation Correlate
C->>Cx: P5: Codex independent review
Cx-->>C: Codex verdict
C->>U: P5: Integrated Verdict Report
Read safety-rules.md and environments.md.
| Check | Method | Fail Action |
|---|---|---|
| Environment select | --env flag or ask user; load from references/environments.md | Default to test |
| API reachable | Deterministic health-check (3x, 2s timeout — see references/environments.md) | Unreachable + Log config → L2-OBS; Unreachable + no Log → L1 |
| Deployment aligned | Compare local HEAD with deployed version | Mismatch → warn, lower confidence |
| Read-only confirmed | Review references/safety-rules.md, confirm all planned operations are read-only | — |
| Degradation level | Check references/environments.md for log/metrics config | Set level (L1-L4) |
Read blackbox-testing.md § P1.
Scope only — no code quality judgment.
git diff main...HEAD --name-only (or user-provided scope)Fallback: If no git diff available, ask user for feature description and build scope manually.
--level override: If user passes --level L2-API, skip log/metrics cases even if configured. --level L2-OBS forces observation-only mode. --level L2 defaults to L2-API for backward compatibility.
Read blackbox-testing.md § P2.
Generate test cases dynamically from P1 results:
| Type | Goal | When |
|---|---|---|
| L1 Regression | Affected API returns expected results | L2-API+ (N/A for L2-OBS) |
| L2 Active Trigger | New code path exercised, verify response | L2-API+ (N/A for L2-OBS) |
| L3 Passive Observe | Background service running, check logs | L3+ only |
| M1 Metrics | Metrics correctly emitted with right labels | L4 only |
User approval gate: Present charter table to user for confirmation before proceeding to P3. User may add/remove/modify cases.
Prerequisites: P2 approved, degradation level is L2-API or higher (L2-API/L3/L4). L2-OBS skips P3 entirely (API unreachable).
For each test case:
references/environments.md (generate unique request ID per call)references/safety-rules.md)references/environments.md (no real user data)# Example execution pattern
make_headers
REQ_ID=$(extract_request_id)
START=$(date +%s%3N)
RESP=$(curl -s -w "\n%{http_code}" -X {{ METHOD }} "$HOST/{{ ENDPOINT }}" \
"${HEADERS[@]}" -d '{{ PAYLOAD }}')
HTTP_CODE=$(echo "$RESP" | tail -1)
BODY=$(echo "$RESP" | sed '$d')
END=$(date +%s%3N)
LATENCY=$((END - START))
Read blackbox-testing.md § P4.
Prerequisites: Degradation level L2-OBS or L3+.
L2-OBS mode: Skip subsection A (no P3 requests to correlate). Execute B (time-window scan) and C (background service observation). Observation window: deploy_time → now (fallback: user-specified or last 30min).
For each P3 request, query logs by request ID with fallback strategy:
Retry: 30s fast → 120s delayed → mark unreachable.
Scan test period for anomalies (error + warn levels).
Query logs for schedule/cron tags with 120s delay.
Query metrics system for affected metrics, verify labels and values.
Record what cannot be observed through black-box testing. List in report for /codex-test-review follow-up.
| Verdict | Condition |
|---|---|
| Pass | L1 passed + L2 has expected signal + L3 normal + M1 correct (N/A items don't block) |
| Warn | L1 passed but L2 signal missing, or L3/M1 has non-blocking anomaly |
| Blocked | L1 failed, or regression detected, or M1 shows incorrect labels |
| Inconclusive | API/log/metrics unreachable, insufficient evidence |
| Level | Condition |
|---|---|
| High | L3/L4 + Claude and Codex agree |
| Medium | L2-API (API-only) or L2-OBS (observation-only) or partial agreement |
| Low | L1 (no runtime) or Claude and Codex diverge |
/codex-brainstorm with P1 scope + P3 results + P4 observations (see references/blackbox-testing.md § P5)Codex must independently verify (see references/blackbox-testing.md § P5 prompt):
references/environments.md)Generate report using output-template.md.
Verdict is independent: Report may recommend follow-up skills (/codex-review-fast, /verify, /codex-test-review) but does NOT auto-invoke them.
| Rule | Description |
|---|---|
| Single request | One request at a time (no load testing) |
| Fixed parameters | Use test parameters from references/environments.md |
| Read-only only | Only allowlisted endpoints (references/safety-rules.md) |
| No PII | No real user credentials, keys, or sensitive data in payloads |
| Rate aware | Respect API rate limits |
references/output-template.md format| File | Content | Read At |
|---|---|---|
| environments.md | API endpoints, auth headers, log/metrics config, test params | P0, P3 |
| safety-rules.md | Read-only rules, endpoint allowlist, forbidden ops | P0, P3 |
| blackbox-testing.md | Diff-lite scoping, test charter design, log verification, blind spots | P1, P2, P4, P5 |
| output-template.md | Verdict report format | P5 |
Input: /feature-verify "User Auth API" --env test
Action: P0(reachable? → L3) → P1(diff → /api/auth/*) → P2(L1+L2 charter, user approves)
→ P3(curl read-only endpoints) → P4(log correlation) → P5(verdict: Pass, High)
Input: /feature-verify "Payment query" --env prod --level L2
Action: P0(prod, forced L2) → P1(diff → /api/payment/query) → P2(L1+L2, no L3)
→ P3(curl) → P4(response-only) → P5(verdict: Pass, Medium)
Input: /feature-verify "Background sync job" --env staging
Action: P0(staging, L3) → P1(diff → cron changes) → P2(L3 passive only)
→ P3(skip — no API endpoint) → P4(log observation for schedule tag) → P5(verdict)
Input: /feature-verify "Cache optimization" (no env configured)
Action: P0(no config → L1) → P1(diff → cache service) → P2(code review only)
→ P3(skip) → P4(skip) → P5(verdict: Inconclusive, Low — recommend configuring references/environments.md)
Input: /feature-verify "Order processing" --env prod
Action: P0(prod, API unreachable 3/3, Log config present → L2-OBS)
→ P1(diff → /api/order/*) → P2(L3 passive + time-window only, no L1/L2 active)
→ P3(skip — API unreachable) → P4(time-window scan: deploy→now, background observation)
→ P5(verdict: Pass/Warn/Inconclusive, Medium)