Skill

feature-verify

Install
1
Install the plugin
$
npx claudepluginhub sd0xdev/sd0x-dev-flow --plugin sd0x-dev-flow

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Feature verification (READ-ONLY, P0-P5). Use when: verifying feature behavior after deployment, validating API responses, diagnosing production issues, post-deploy smoke test. Not for: modifying data (use feature-dev), code review (use codex-review-fast), writing tests (use codex-test-gen), security audit (use codex-security).

Tool Access

This skill is limited to using the following tools:

ReadGrepGlobBashWebFetchTaskSkillmcp__codex__codexmcp__codex__codex-reply
Supporting Assets
View in Repository
references/blackbox-testing.md
references/environments.md
references/output-template.md
references/safety-rules.md
Skill Content

Feature Verify — Runtime-First API Verification

Trigger

  • Keywords: verify, investigate, diagnose, check if working, post-deploy, smoke test, validate
  • User wants to confirm deployed feature behavior
  • User provides environment access (API URL, log system, credentials)

When NOT to Use

NeedUse Instead
Modify data or state/feature-dev
Code quality review/codex-review-fast
Generate unit tests/codex-test-gen
Security audit/codex-security
Run local tests/verify
Review test coverage/codex-test-review

Core Principle

⚠️ ALL OPERATIONS MUST BE READ-ONLY ⚠️

Claude independent analysis → Codex third-perspective confirmation → Integrated verdict

Tool safety note: allowed-tools includes Bash for curl/log queries. Read-only enforcement is behavioral — all commands MUST be reviewed against references/safety-rules.md before execution. Codex independently verifies compliance at P5.

Degradation Matrix

Auto-detect from references/environments.md configuration:

LevelAvailable ResourcesP3 APIP4 ObservationConfidence Cap
L4API + Log + MetricsFullLog + MetricsHigh
L3API + LogFullLog onlyHigh
L2-APIAPI onlyFullResponse-onlyMedium
L2-OBSLog only (API unreachable)SkipTime-window scanMedium
L1No runtime accessSkip P3/P4Code review onlyLow

Auto-detection logic (see references/environments.md § Degradation Detection):

API StatusLog SystemMetricsLevel
ReachableYesYesL4
ReachableYesNoL3
ReachableNoL2-API
UnreachableYesL2-OBS
UnreachableNoL1

Fail-closed: If Endpoint Allowlist section is missing, skip P3 (cannot call unverified endpoints). At L1, skip P3 and P4. Provide code-review-based analysis only with Low confidence. At L2-OBS, skip P3 (API unreachable); execute P4 time-window scan and background service observation only.

Workflow

sequenceDiagram
    participant C as Claude
    participant U as User
    participant API as Target API
    participant Log as Log System
    participant Cx as Codex

    C->>C: P0: Scope & Safety
    C->>C: P1: Diff-Lite Scoping
    C->>U: P2: Test Charter (approve?)
    U->>C: Approved
    C->>API: P3: API Execute (read-only)
    C->>Log: P4: Observation Correlate
    C->>Cx: P5: Codex independent review
    Cx-->>C: Codex verdict
    C->>U: P5: Integrated Verdict Report

P0: Scope & Safety

Read safety-rules.md and environments.md.

CheckMethodFail Action
Environment select--env flag or ask user; load from references/environments.mdDefault to test
API reachableDeterministic health-check (3x, 2s timeout — see references/environments.md)Unreachable + Log config → L2-OBS; Unreachable + no Log → L1
Deployment alignedCompare local HEAD with deployed versionMismatch → warn, lower confidence
Read-only confirmedReview references/safety-rules.md, confirm all planned operations are read-only
Degradation levelCheck references/environments.md for log/metrics configSet level (L1-L4)

P1: Diff-Lite Scoping

Read blackbox-testing.md § P1.

Scope only — no code quality judgment.

  1. Get diff: git diff main...HEAD --name-only (or user-provided scope)
  2. Map changed files → affected endpoints → dependency chains
  3. Identify L1 regression endpoints, L2 trigger cases, L3 passive targets

Fallback: If no git diff available, ask user for feature description and build scope manually.

--level override: If user passes --level L2-API, skip log/metrics cases even if configured. --level L2-OBS forces observation-only mode. --level L2 defaults to L2-API for backward compatibility.

P2: Test Charter

Read blackbox-testing.md § P2.

Generate test cases dynamically from P1 results:

TypeGoalWhen
L1 RegressionAffected API returns expected resultsL2-API+ (N/A for L2-OBS)
L2 Active TriggerNew code path exercised, verify responseL2-API+ (N/A for L2-OBS)
L3 Passive ObserveBackground service running, check logsL3+ only
M1 MetricsMetrics correctly emitted with right labelsL4 only

User approval gate: Present charter table to user for confirmation before proceeding to P3. User may add/remove/modify cases.

P3: API Execute

Prerequisites: P2 approved, degradation level is L2-API or higher (L2-API/L3/L4). L2-OBS skips P3 entirely (API unreachable).

For each test case:

  1. Load headers from references/environments.md (generate unique request ID per call)
  2. Send request — only allowlisted endpoints (references/safety-rules.md)
  3. Record: HTTP status, response code, key response fields, request ID, latency
  4. Single request at a time (no concurrent/load testing)
  5. Use fixed test parameters from references/environments.md (no real user data)
# Example execution pattern
make_headers
REQ_ID=$(extract_request_id)
START=$(date +%s%3N)
RESP=$(curl -s -w "\n%{http_code}" -X {{ METHOD }} "$HOST/{{ ENDPOINT }}" \
  "${HEADERS[@]}" -d '{{ PAYLOAD }}')
HTTP_CODE=$(echo "$RESP" | tail -1)
BODY=$(echo "$RESP" | sed '$d')
END=$(date +%s%3N)
LATENCY=$((END - START))

P4: Observation Correlate

Read blackbox-testing.md § P4.

Prerequisites: Degradation level L2-OBS or L3+.

L2-OBS mode: Skip subsection A (no P3 requests to correlate). Execute B (time-window scan) and C (background service observation). Observation window: deploy_time → now (fallback: user-specified or last 30min).

A. Per-Request Log Correlation (L1/L2 test case types, requires L3+)

For each P3 request, query logs by request ID with fallback strategy:

  1. Primary: request ID exact match
  2. Fallback: alternate field names
  3. Fallback: endpoint + time window

Retry: 30s fast → 120s delayed → mark unreachable.

B. Time-Window Scan (all cases)

Scan test period for anomalies (error + warn levels).

C. L3 Background Service Observation (if applicable)

Query logs for schedule/cron tags with 120s delay.

D. Metrics Observation (L4 only, if applicable)

Query metrics system for affected metrics, verify labels and values.

E. Blind Spot Analysis

Record what cannot be observed through black-box testing. List in report for /codex-test-review follow-up.

P5: Verdict

Per-Endpoint Verdict

VerdictCondition
PassL1 passed + L2 has expected signal + L3 normal + M1 correct (N/A items don't block)
WarnL1 passed but L2 signal missing, or L3/M1 has non-blocking anomaly
BlockedL1 failed, or regression detected, or M1 shows incorrect labels
InconclusiveAPI/log/metrics unreachable, insufficient evidence

Confidence Level

LevelCondition
HighL3/L4 + Claude and Codex agree
MediumL2-API (API-only) or L2-OBS (observation-only) or partial agreement
LowL1 (no runtime) or Claude and Codex diverge

Dual Verification (Claude + Codex)

  1. Claude analysis: Form independent conclusion from P3 + P4 evidence
  2. Codex review: Use /codex-brainstorm with P1 scope + P3 results + P4 observations (see references/blackbox-testing.md § P5)
  3. Integrated verdict: Synthesize both perspectives

Codex must independently verify (see references/blackbox-testing.md § P5 prompt):

  • No write operations were performed during P3
  • Each endpoint called was on the Endpoint Allowlist (references/environments.md)
  • All HTTP methods match allowlist (GET or allowlisted POST)
  • Verdict is justified by evidence

Output

Generate report using output-template.md.

Verdict is independent: Report may recommend follow-up skills (/codex-review-fast, /verify, /codex-test-review) but does NOT auto-invoke them.

Production Guardrails

RuleDescription
Single requestOne request at a time (no load testing)
Fixed parametersUse test parameters from references/environments.md
Read-only onlyOnly allowlisted endpoints (references/safety-rules.md)
No PIINo real user credentials, keys, or sensitive data in payloads
Rate awareRespect API rate limits

Verification Checklist

  • P0: Environment selected, reachable, deployment aligned
  • P0: Degradation level determined
  • P1: Affected endpoints mapped from diff (or user input)
  • P2: Test charter approved by user
  • P3: All API calls are read-only and on allowlist (L2-API+)
  • P3: L2-OBS correctly skips API execution
  • P3: Each call recorded with HTTP status, request ID, latency
  • P4: Log correlation attempted for each request (L3+)
  • P4: Time-window scan completed (L2-OBS or L3+)
  • P4: L2-OBS time-window scan uses correct observation window
  • P4: Blind spots documented
  • P5: Claude analysis formed independently
  • P5: Codex review completed independently
  • P5: Integrated verdict with confidence level
  • Report follows references/output-template.md format

References

FileContentRead At
environments.mdAPI endpoints, auth headers, log/metrics config, test paramsP0, P3
safety-rules.mdRead-only rules, endpoint allowlist, forbidden opsP0, P3
blackbox-testing.mdDiff-lite scoping, test charter design, log verification, blind spotsP1, P2, P4, P5
output-template.mdVerdict report formatP5

Examples

Input: /feature-verify "User Auth API" --env test
Action: P0(reachable? → L3) → P1(diff → /api/auth/*) → P2(L1+L2 charter, user approves)
        → P3(curl read-only endpoints) → P4(log correlation) → P5(verdict: Pass, High)
Input: /feature-verify "Payment query" --env prod --level L2
Action: P0(prod, forced L2) → P1(diff → /api/payment/query) → P2(L1+L2, no L3)
        → P3(curl) → P4(response-only) → P5(verdict: Pass, Medium)
Input: /feature-verify "Background sync job" --env staging
Action: P0(staging, L3) → P1(diff → cron changes) → P2(L3 passive only)
        → P3(skip — no API endpoint) → P4(log observation for schedule tag) → P5(verdict)
Input: /feature-verify "Cache optimization" (no env configured)
Action: P0(no config → L1) → P1(diff → cache service) → P2(code review only)
        → P3(skip) → P4(skip) → P5(verdict: Inconclusive, Low — recommend configuring references/environments.md)
Input: /feature-verify "Order processing" --env prod
Action: P0(prod, API unreachable 3/3, Log config present → L2-OBS)
        → P1(diff → /api/order/*) → P2(L3 passive + time-window only, no L1/L2 active)
        → P3(skip — API unreachable) → P4(time-window scan: deploy→now, background observation)
        → P5(verdict: Pass/Warn/Inconclusive, Medium)
Stats
Stars90
Forks12
Last CommitMar 19, 2026
Actions

Similar Skills