From workflows
Verifies data science analysis results for reproducibility and completion, using guardrails to gate tool usage until approval.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:ds-verifyAgentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-pre-subagent-clear.pyReaduv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGrepuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGlobuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyWriteuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyEdituv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyBashuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyAgentGATE_ARTIFACT=.planning/REVIEW_STATE.md GATE_STATUS=APPROVED GATE_DESCRIPTION="Review verdict" GATE_REMEDY="Complete ds-review until REVIEW_STATE.md verdict is APPROVED; verification is gated until then." GATE_BLOCKED_TOOLS=Agent uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.pyAgentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-post-subagent-guard.pyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Announce: "Using ds-verify (Phase 5) to confirm reproducibility and completion."
Announce: "Using ds-verify (Phase 5) to confirm reproducibility and completion."
| Level | Remaining Context | Action |
|---|---|---|
| Normal | >35% | Proceed normally |
| Warning | 25-35% | Complete current review cycle, then trigger ds-handoff |
| Critical | ≤25% | Immediately trigger ds-handoff — do not start new review cycles |
Final verification with reproducibility checks and user acceptance interview.
## The Iron Law of DS VerificationNO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION. This is not negotiable.
Load shared enforcement first.
Auto-load all constraints matching applies-to: ds-verify:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py ds-verify
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Before claiming analysis is complete, you MUST:
This applies even when:
If you catch yourself thinking "I can skip verification," STOP — you're about to deliver unverified results that waste the user's time.
Before running runtime DQ checks, run the static analysis constraint check suite:
bash "${CLAUDE_SKILL_DIR}/../../scripts/check-all-ds.sh" "$(pwd)"
This runs all DS constraint check scripts (determinism, join audits, idempotency, error handling, schema contracts, standard errors, visualization integrity).
If any check FAILS: Report the failures in LEARNINGS.md. These are code quality issues in the analysis scripts that must be fixed before proceeding. Dispatch a fix subagent if needed.
If all checks PASS: Proceed to runtime DQ checks.
Checkpoint type: decision (user confirms results — cannot auto-advance)
Before making ANY completion claim, follow this flowchart.
This flowchart IS the specification. If prose elsewhere and this diagram disagree, the diagram wins.
┌──────────────────────────────┐
│ 1. RE-RUN (fresh, not cached) │
└──────────────┬───────────────┘
▼
┌──────────────────────────────┐
│ 2. CHECK vs success criteria │
└──────────────┬───────────────┘
pass? │
┌───── no ──┴── yes ─────┐
▼ ▼
┌─────────────────┐ ┌──────────────────────────┐
│ NEEDS WORK → │ │ 3. REPRODUCE │
│ log + dispatch │ │ (same inputs→same outputs)│
│ fix subagent │ └────────────┬─────────────┘
└────────┬────────┘ match? │
│ ┌──── no ──────┴── yes ───┐
│ ▼ ▼
│ ┌─────────────────┐ ┌─────────────────────────┐
│ │ NEEDS WORK → │ │ 4. ASK — user │
│ │ non-determinism │ │ acceptance interview │
│ │ is a defect │ └───────────┬─────────────┘
│ └────────┬────────┘ accept? │
│ │ ┌── no/partial ────┴── yes ──┐
│ │ ▼ ▼
│ │ ┌──────────────────┐ ┌────────────────────┐
└───────────┴─▶│ loop: ds-fix / │ │ 5. CLAIM COMPLETE │
│ ds-implement, │ │ (only after 1-4) │
│ then re-verify │ └────────────────────┘
└──────────────────┘
Skipping any step is not verification. Reaching step 5 without passing 1-4 is a false completion claim.
When presenting verification results to the user in the acceptance interview, generate diagnostic plots to support the decision:
| Verification Check | Diagnostic to Generate |
|---|---|
| Reproducibility comparison | Overlay plot of Run 1 vs Run 2 key outputs |
| Data integrity | Pipeline waterfall chart (input rows → cleaning → joins → final) |
| Distribution sanity | Histogram/density plots of key variables with expected ranges annotated |
| Model performance | ROC curve, residual plot, or coefficient comparison (as appropriate) |
Format: Inline plots in notebooks, or saved to scratch/diagnostics/ for script-based workflows. Present alongside the acceptance interview questions.
Trace to Requirements: For each success criterion, reference its requirement ID (e.g., "DATA-01: Panel has 50K+ firm-years — VERIFIED with df.shape output"). End-to-end traceability from SPEC.md through PLAN.md through VALIDATION.md through verification.
CRITICAL: Before claiming completion, conduct user interview.
AskUserQuestion:
question: "Were there specific methodology requirements I should have followed?"
options:
- label: "Yes, replicating existing analysis"
description: "Results should match a reference"
- label: "Yes, required methodology"
description: "Specific methods were mandated"
- label: "No constraints"
description: "Methodology was flexible"
If replicating:
AskUserQuestion:
question: "Do these results answer your original question?"
options:
- label: "Yes, fully"
description: "Analysis addresses the core question"
- label: "Partially"
description: "Some aspects addressed, others missing"
- label: "No"
description: "Does not answer the question"
If "Partially" or "No":
/ds-implement to address gapsAskUserQuestion:
question: "Are the outputs in the format you need?"
options:
- label: "Yes"
description: "Format is correct"
- label: "Need adjustments"
description: "Format needs modification"
AskUserQuestion:
question: "Do you have any concerns about the methodology or results?"
options:
- label: "No concerns"
description: "Comfortable with approach and results"
- label: "Minor concerns"
description: "Would like clarification on some points"
- label: "Major concerns"
description: "Significant issues need addressing"
MANDATORY: Demonstrate reproducibility before completion.
## Independent Verification RequiredYou MUST NOT verify your own work. Spawn a fresh Task agent for reproducibility.
The implementer shares biases and sunk-cost attachment. A fresh subagent sees only the spec and outputs — it verifies without context pollution.
If you're about to re-run the analysis yourself, STOP. Dispatch a Task agent.
Dispatch a fresh Task agent to run the reproducibility check:
All paths below are relative to this skill's base directory.
Agent(subagent_type="general-purpose",
allowed_tools=["Read", "Glob", "Grep", "Bash(read-only)"],
prompt="""
# Reproducibility Verification
**Tool Restrictions:** The verifier is READ-ONLY. It re-runs analyses and checks output but MUST NOT modify notebooks, scripts, or code. It MUST NOT use Write or Edit.
Verify this analysis produces consistent results from a fresh run.
## Context
- Read .planning/SPEC.md for objectives and success criteria
- Read .planning/PLAN.md for expected outputs
- Read .planning/LEARNINGS.md for pipeline documentation
## Shared Checks
Read the shared check definitions:
Read `${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/ds-checks.md` and follow its instructions.
Run checks: DQ1-DQ4, DQ6, M1, R1
## Reproducibility Protocol
### For scripts:
```python
# Run 1
result1 = run_analysis(seed=42)
hash1 = hash(str(result1))
# Run 2
result2 = run_analysis(seed=42)
hash2 = hash(str(result2))
# Verify
assert hash1 == hash2, "Results not reproducible!"
print(f"Reproducibility confirmed: {hash1} == {hash2}")
jupyter nbconvert --execute --inplace notebook.ipynb
papermill notebook.ipynb output.ipynb -p seed 42
Report:
**Post-subagent boundary (C5):** After verification agent returns, read its report only. Do NOT read source code, notebooks, or data files yourself. If FAIL, dispatch a fresh investigation subagent.
**If Task agent reports FAIL:** Dispatch a fresh Task agent to investigate the discrepancy. Do NOT investigate yourself — that violates the post-subagent boundary (C5 from ds-common-constraints.md).
## Claims Requiring Evidence
| Claim | Required Evidence |
|-------|-------------------|
| "Analysis complete" | All success criteria verified |
| "Results reproducible" | Same output from fresh run |
| "Matches reference" | Comparison showing match |
| "Data quality handled" | Documented cleaning steps |
| "Methodology appropriate" | Assumptions checked |
## Insufficient Evidence
These do NOT count as verification:
- Previous run results (must be fresh)
- "Should be reproducible" (demonstrate it)
- Visual inspection only (quantify where possible)
- Single run (need reproducibility check)
- Skipped user acceptance (must ask)
## Required Output Structure
```markdown
## Verification Report: [Analysis Name]
### Technical Verification
#### Outputs Generated
- [ ] Output 1: [location] - verified [date/time]
- [ ] Output 2: [location] - verified [date/time]
#### Reproducibility Check
- Run 1 hash: [value]
- Run 2 hash: [value]
- Match: YES/NO
#### Environment
- Python: [version]
- Key packages: [list with versions]
- Random seed: [value]
### User Acceptance
#### Replication Check
- Constraint: [none/replicating/required methodology]
- Reference: [if applicable]
- Match status: [if applicable]
#### User Responses
- Results address question: [yes/partial/no]
- Output format acceptable: [yes/needs adjustment]
- Methodology concerns: [none/minor/major]
### Verdict
**COMPLETE** or **NEEDS WORK**
[If COMPLETE]
- All technical checks passed
- User accepted results
- Reproducibility demonstrated
[If NEEDS WORK]
- [List items requiring attention]
- Recommended next steps
Maximum 3 verification cycles. If issues persist after 3 rounds, escalate to user with summary of blocking issues.
Chaining instruction (if NEEDS WORK). Discover and load ds-implement:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-implement/SKILL.md and follow its instructions.
Then fix the identified issues and re-run verification.
Only claim COMPLETE when ALL are true:
Both technical and user acceptance must pass. No shortcuts.
When user confirms all criteria are met:
Announce: "DS workflow complete. All 5 phases passed."
The /ds workflow is now finished. Offer to:
.planning/ files/dsnpx claudepluginhub edwinhu/workflows --plugin workflowsEnforces fresh verification of analysis results before making claims. Requires running the analysis from raw data and reading actual output before reporting any finding.
Reviews data analysis methodology and quality as Phase 4 of the /ds workflow. Supports systematic review with strategy selection and context monitoring.
Reviews data analyses for quality, correctness, and reproducibility including data quality, assumption checks, model validation, leakage detection, and reproducibility verification.