From workflows
Executes analysis tasks with output-first verification, enforcing visible output at every step. Phase 3 of the /ds workflow, called by ds-plan.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflows:ds-implementThis skill is limited to the following tools:
Agentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-pre-subagent-clear.pyReaduv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGrepuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyGlobuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-read-after-subagent-guard.pyWriteuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyEdituv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyBashuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-no-main-chat-code-guard.pyWrite|Edit|Agent|WorkflowGATE_ARTIFACT=.planning/PLAN_REVIEWED.md GATE_STATUS=APPROVED GATE_DESCRIPTION="Plan review" GATE_REMEDY="Return to ds-plan and run ds-plan-reviewer; implementation is gated until PLAN.md is APPROVED." GATE_BLOCKED_TOOLS=Write,Edit,Agent,Workflow uv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/phase-gate-guard.pyAgentuv run python3 ${CLAUDE_PLUGIN_ROOT}/hooks/ds-post-subagent-guard.pyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Apply output-first verification at every step of analysis implementation. This is Phase 3 of the `/ds` workflow.
Apply output-first verification at every step of analysis implementation. This is Phase 3 of the /ds workflow.
references/verification-patterns.mdImplement analysis with mandatory visible output at every step. NO TDD - instead, every code step MUST produce and verify output.
## The Iron Law of DS ImplementationEVERY CODE STEP MUST PRODUCE VISIBLE OUTPUT. This is not negotiable.
Before moving to the next step, you MUST:
.planning/LEARNINGS.mdThis applies even when YOU think:
If you're about to write code without outputting results, STOP.
You orchestrate the ds-implement ultracode workflow, which reads the hardened PLAN.md Task Breakdown table, builds the data-flow DAG, and runs each dependency level's tasks output-first (one ds-analyst/ds-engineer per task, writing directly to the project). You drive the level loop; the workflow's implementers do the analysis/ETL.
0. Set the goal (once): /goal All tasks in PLAN.md are marked [x], each task's Verify
assertion exits 0, and .planning/VALIDATION.md status is `validated`. Stop after [N] turns.
LOOP (one turn per level, under the active /goal):
1. Workflow(name="ds-implement", args={
"projectDir": "<absolute project root (cwd)>",
"pluginRoot": "<resolve ${CLAUDE_SKILL_DIR}/../../workflows>"
})
→ runs the lowest level's pending tasks output-first, returns { overallPass, level,
tasksRemaining, tasks, findings, tasksThatFailed, reviews }. Outputs are already on disk.
2. GROUND-TRUTH: run ds-validate-coverage (or the full pipeline) on the level's outputs —
per-task Verify ran in isolation; this confirms requirement coverage / no regression.
3. If result.overallPass AND coverage clean: mark this level's PLAN rows [x], log to
LEARNINGS.md, END THE TURN (the /goal re-fires for the next level, or closes if
tasksRemaining=0). No pause.
4. If result.overallPass is false: read result.findings, fix the cause, re-invoke with
onlyChecks=result.tasksThatFailed + priorReviews=result.reviews. An R4 (schema change,
new data source, methodology pivot) is critical — STOP and escalate to the user.
The legacy per-task ds-delegate template is now embedded in the workflow's implementer prompt; ds-delegate remains for ad-hoc single-task dispatch outside this phase. If you're about to write analysis code directly, STOP — the workflow's implementers do that, and ds-no-main-chat-code-guard forbids you (you may only touch .planning/).
| Scenario | Action |
|---|---|
| You wrote > 3 lines of analysis code in main chat | DELETE immediately. Restart via Task agent. |
| You ran a cell, realized it should have been in Task agent | DELETE the cell output and cell. Re-do via Task agent. |
| You started a transformation in main chat | STOP. DELETE what you've done. Spawn Task agent instead. |
| "Just finish this quick analysis here" | STOP — if it's quick enough to finish, it's quick enough for a Task agent. Delete and restart. |
Helpfulness Check: If you kept main-chat code "because it worked," you bypassed the orchestration protocol. Working code written in the wrong place skips verification and review — it is anti-helpful to the user. Delete it.
| DO | DON'T |
|---|---|
| Print shape after each transform | Chain operations silently |
| Display sample rows | Trust transformations work |
| Show summary stats | Wait until end to check |
| Verify row counts | Assume merges worked |
| Check for unexpected nulls | Skip intermediate checks |
| Plot distributions | Move on without looking |
The Mantra: If not visible, it cannot be trusted.
.planning/PLAN.md expected output..planning/LEARNINGS.md without verified output is a false claim that review inherits — the task may have silently failed, and the user acts on results that don't exist. Logging a verified completion takes 30 seconds; an unlogged step is invisible to review.After prerequisites pass and PLAN.md verified, check for parallelization potential:
Skip this choice when:
after N with no independent groups)CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS is not availableOtherwise, ask the user:
AskUserQuestion(questions=[{
"question": "How should we implement the analysis tasks in PLAN.md?",
"header": "Strategy",
"options": [
{"label": "Sequential (Default)", "description": "One task at a time with output-first verification. Safest, most DS work is sequential."},
{"label": "Agent team (parallel)", "description": "Spawn analyst per independent task group. Only for truly independent analysis branches (descriptive stats by subgroup, model comparisons). Requires reconciliation."}
],
"multiSelect": false
}])
If Sequential: Proceed to Implementation Process below (current behavior).
If Agent team: Skip to Agent Team Implementation (Parallel).
If PLAN.md specifies Implementation Language: SAS or Mixed, load SAS enforcement BEFORE dispatching any SAS tasks. Paste the enforcement block into every SAS subagent prompt.
Full SAS enforcement rules: See references/sas-enforcement.md
┌─────────────────────────┐
│ Read PLAN.md + Load │
│ ds-delegate + ETL refs │
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ For each task in PLAN │◄──────────────────────┐
│ (in dependency order) │ │
└───────────┬─────────────┘ │
▼ │
┌─────────────────────────┐ │
│ Dispatch Task agent │ │
│ (per ds-delegate) │ │
└───────────┬─────────────┘ │
▼ │
┌─────────────────────────┐ ┌──────────────┐ │
│ Read agent output │────→│ Output wrong │ │
│ Verify output present │ │ or missing? │ │
│ + reasonable │ └──────┬───────┘ │
└───────────┬─────────────┘ │ │
│ OK ▼ │
│ ┌──────────────────┐ │
│ │ STOP. Investigate │ │
│ │ Log issue. Fix. │ │
│ │ Re-verify. │ │
│ └──────────────────┘ │
▼ │
┌─────────────────────────┐ │
│ Log to LEARNINGS.md │ │
│ (Task N: COMPLETE) │ │
└───────────┬─────────────┘ │
▼ │
More tasks? ──── YES ─────────────────────┘
│
NO
▼
┌─────────────────────────┐
│ Exit Gate: Compare │
│ PLAN.md vs LEARNINGS │
│ (all tasks accounted?) │
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ Invoke ds-validate │
└─────────────────────────┘
This flowchart IS the specification. If the narrative below and this flowchart disagree, the flowchart wins.
If user sends an off-topic message during implementation, follow C6 from ds-common-constraints.md:
Do NOT silently switch context. Silent switches kill the implementation loop.
Auto-load all constraints matching applies-to: ds-implement:
!uv run python3 ${CLAUDE_SKILL_DIR}/../../scripts/load-constraints.py ds-implement
You MUST have these constraints loaded before proceeding. No claiming you "remember" them.
Read(".planning/PLAN.md")
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-delegate/SKILL.md and follow its instructions.
Follow the task order defined in the plan. Use ds-delegate's templates for every task.
ETL Strategy Enforcement — load domain-specific references based on PLAN.md:
If PLAN.md contains an ## ETL Strategy section, the user made decisions during planning that MUST be enforced during implementation. Check each subsection and load the corresponding enforcement:
| PLAN.md Section | Enforcement Reference | Inject Into |
|---|---|---|
row_pk / event_key declared (any data with a grain) | ETL enforcement (skills/ds-implement/references/etl-enforcement.md) § Key & Grain Carry-Through | Every data load/transform subagent prompt |
Filters & Parameters table present | parameter-transparency: reference the named config location by name, NO inline literals (see Parameter Centralization) | Every subagent prompt that filters/caps/winsorizes/windows data |
Implementation Language: SAS or Mixed | SAS ETL enforcement (skills/wrds/references/sas-etl.md) | Every SAS subagent prompt |
Filter Strategy table present | ETL enforcement (skills/ds-implement/references/etl-enforcement.md) § Filter Push-Down | Subagent prompts for data loading tasks |
Parallelism Plan table present | ETL enforcement (skills/ds-implement/references/etl-enforcement.md) § Parallelism | Implementation strategy choice |
Data Flow with intermediates | ETL enforcement (skills/ds-implement/references/etl-enforcement.md) § Caching | Subagent prompts for tasks producing/consuming intermediates |
Scale-Up Testing Plan table present | ETL enforcement (skills/ds-implement/references/etl-enforcement.md) § Scale-Up + domain reference (e.g., gemini-batch/references/scale-up-testing.md) | Before any batch submission task |
To load these references, discover the plugin cache path first:
${CLAUDE_SKILL_DIR}/../../skills/wrds/references/sas-etl.md and follow its instructions.${CLAUDE_SKILL_DIR}/../../skills/ds-implement/references/etl-enforcement.md and follow its instructions.If PLAN.md has NO ETL Strategy section: Skip this — proceed directly to Step 2.
Before starting each task, check context availability:
| Level | Remaining Context | Action |
|---|---|---|
| Normal | >35% | Proceed with task |
| Warning | 25-35% | Complete current task, then invoke ds-handoff |
| Critical | ≤25% | Invoke ds-handoff immediately — no new tasks |
At Warning level: After current task completes, invoke:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-handoff/SKILL.md and follow its instructions.
Why: A multi-task analysis pipeline with 20% context remaining produces degraded output. Better to handoff cleanly and resume fresh.
For each task in PLAN.md:
Document every significant step:
## Task N: [Description] - COMPLETE
**Input:** [Describe input state]
**Operation:** [What was done]
**Output:**
- Shape: [final shape]
- Key findings: [observations]
**Verification:** [How you confirmed it worked]
**Next:** [What comes next]
After a task passes review, append a structured summary to LEARNINGS.md:
## Task N: [task description]
---
task: N
status: completed
implements: [DATA-01, STAT-03]
affects: [notebooks/analysis.py, data/processed/]
key-files:
created: [list of new files]
modified: [list of changed files]
deviations: {r1: 0, r2: 1, r3: 0, r4: 0}
---
One-liner: [SUBSTANTIVE summary — not "Task complete" but "Merged CRSP-Compustat panel with winsorized returns at 1%/99%"]
Changes: [what was added/modified and why]
Output: [output files produced and their contents]
One-liner rule: Must be SUBSTANTIVE. Good: "Panel regression with firm and year FE, clustered SEs, 3 robustness checks". Bad: "Completed task 3".
If PLAN.md has a ## Dataset Construction Diagram (the master-datasets mermaid flowchart: raw → merges → filters → master datasets → exhibits), it is a required doc deliverable the pipeline must keep current. PLAN.md's diagram is the intended construction; the docs carry the construction that actually ran.
docs/INVESTIGATION or the analysis README/notebook header) with the real merge keys and the actual row-drops each filter produced (the profiled numbers, not the planned estimates).[(rounded)] and label every edge with its key or filter+row-drop. An edgeless box diagram hides the sample funnel, which is the one thing the diagram exists to show.This is cheap to edit per-task and expensive to reconstruct from memory at write-up time. The final diagram is what ds-handoff records and ds-review checks against the code.
If PLAN.md has a ## Filters & Parameters table, it names a single config location (default: a plain src/config.py of named constants with rationale in inline comments). Every task subagent MUST read parameters from that location by name and write NO inline numeric literals for any analysis decision — filters, bands, caps, winsorization levels, date windows, min-obs counts.
<config location> referenced by name. Do NOT hard-code numeric literals for analysis decisions — if a value you need is missing from the config, ADD it there (with a Filters & Parameters row), don't inline it."df[df.price > 100] is a magic number; df[df.price > MIN_PRICE] is correct. Loop indices, unit conversions (* 100 for percent), and array offsets are not parameters — leave them inline.## Filters & Parameters table (constant · value · applied in · rationale/source · principled? · disposition). A new convenience (⚠) parameter needs a disposition (robustness panel / verified-redundant / display-only) — log it for ds-review; principled (✓) requires a cited source or a validation result.[Rule 2 - Missing Critical] deviation: centralize it (move to config, reference by name), verify the output is unchanged, and track it.A magic number that reaches the final pipeline is a replication landmine the reviewer will flag — centralizing as you write costs one import; retrofitting after literals scatter costs an audit pass (the exact rework Edwin's muni magic-numbers audit is paying down).
See references/verification-patterns.md for detailed code patterns for:
See references/etl-enforcement.md for ETL strategy enforcement:
Triggers when PLAN.md includes a Scale-Up Testing Plan table. NO FULL BATCH WITHOUT A SUCCESSFUL TEST BATCH. This is not negotiable.
Three stages: Test (~10 items, always required) -> Intermediate (~100, if total >500) -> Large (~1,000, if total >5,000). Each stage has quality gates that must pass before scaling up.
Full protocol, scale-up facts, and red flags: See references/scale-up-testing.md
| Failure | Why It Happens | Prevention |
|---|---|---|
| Silent data loss | Merge drops rows | Print row counts before/after |
| Hidden nulls | Join introduces nulls | Check null counts after joins |
| Wrong aggregation | Groupby logic error | Display sample groups |
| Type coercion | Pandas silent conversion | Verify dtypes after load |
| Off-by-one | Date filtering edge cases | Print min/max dates |
Never hide failures. Bad output documented is better than silent failure.
The user sees results at the END and is waiting for completion, not interim check-ins — a courtesy pause costs a full turn round-trip and delivers nothing. Pause only when ALL tasks are complete or you are blocked.
Your pausing between tasks is procrastination disguised as courtesy.
Dynamic plan re-read. After each task completes, RE-READ .planning/PLAN.md before starting the next task. A prior task (or a Rule-2/Rule-3 deviation) may have inserted, reordered, or removed tasks. Trusting a stale in-memory task list silently skips dynamically-added work. The on-disk PLAN.md is the source of truth.
Blocker handling. When a task cannot proceed (subagent reports failure it cannot auto-fix under R1-R3, missing dependency, environment error), do NOT silently stop. Present the blocker with three options and act on the choice:
| Option | When | Action |
|---|---|---|
| Retry | Transient / fixable cause | Re-dispatch the task subagent with the blocker context added |
| Skip | Task is non-blocking for downstream work | Mark the task blocked in PLAN.md, log to LEARNINGS.md, continue to the next independent task |
| Stop | Blocker invalidates the plan (R4-class) | Invoke ds-handoff, escalate to the user |
In autonomous/auto-advance mode, default to Retry once, then Skip if still blocked and the task is non-critical, then Stop. Record the chosen path in LEARNINGS.md.
When subagents encounter unplanned issues during implementation, follow this 4-rule system:
| Rule | Trigger | Action | Permission |
|---|---|---|---|
| R1: Bug | Data integrity bugs, wrong joins, type errors, off-by-one in date ranges, NaN propagation, index alignment errors | Fix → verify output with output-first protocol → track [Rule 1 - Bug] | Auto |
| R2: Missing Critical | Missing null handling, no dedup check after merge, missing row count verification, no dtype validation, missing outlier handling | Add → verify → track [Rule 2 - Missing Critical] | Auto |
| R3: Blocking | Missing dependency/package, wrong file path, data file unavailable, API rate limit, memory error on large data | Fix blocker → verify proceeds → track [Rule 3 - Blocking] | Auto |
| R4a: Data Assumption | Data doesn't match expected shape/schema/distribution — expected panel but got cross-section, unexpected nulls in key column, different date range than specified, unexpected categories | STOP → present finding with evidence → track [Rule 4a - Data Assumption] | Ask user |
| R4b: Methodology Change | Analysis approach needs changing — different model needed, different sample definition, different variable construction, need to add/remove control variables | STOP → present decision with alternatives → track [Rule 4b - Methodology] | Ask user |
Priority: R4a/R4b (STOP) > R1-R3 (auto) > unsure → escalate as R4.
Edge cases:
Tracking format per task:
Each task summary in .planning/LEARNINGS.md should end with:
Deviations: N auto-fixed (R1: X, R2: Y, R3: Z). R4 escalations: [list or "none"].
Full protocol: See references/agent-team-protocol.md for prerequisites, spawn prompt template, lead monitoring, reconciliation (3 passes), and usage guidelines.
Key points:
Checkpoint type: human-verify (all tasks pass — machine-verifiable)
**You MUST NOT proceed to review without verifying ALL tasks are complete. This is not negotiable.**Before proceeding to validation, execute this gate:
.planning/PLAN.md — list every task by number and name.planning/LEARNINGS.md — find entries for each taskStaleness Check: LEARNINGS.md must be updated in THIS session, not reused from prior work.
Stale LEARNINGS.md = false gate pass = unverified work = the user gets results no one actually checked.
Write(".planning/IMPLEMENT_COMPLETE.md", """---
status: COMPLETE
tasks_total: [N]
date: [ISO 8601]
---
All PLAN.md tasks complete and verified in LEARNINGS.md. ds-validate may proceed.
""")
If ANY task is missing from LEARNINGS.md, implement it before proceeding. Do NOT write the sentinel until the task counts match.
Claiming all tasks are done without checking LEARNINGS.md against PLAN.md is NOT HELPFUL — missing tasks mean incomplete analysis the user relies on.
After passing the exit gate (sentinel written), IMMEDIATELY discover and read the validation phase:
Read ${CLAUDE_SKILL_DIR}/../../skills/ds-validate/SKILL.md and follow its instructions. Follow its instructions to validate outputs before review.
This gate is hook-enforced: ds-validate declares a PreToolUse phase-gate-guard.py hook that blocks its validator dispatch until .planning/IMPLEMENT_COMPLETE.md exists with status: COMPLETE.
npx claudepluginhub edwinhu/workflows --plugin workflowsSubagent delegation for data analysis. Dispatches fresh Task agents per step with output-first verification. Enforced via hooks to prevent analysis code in main chat.
Executes implementation plans phase-by-phase: dispatches subagents per task, reviews once per phase with code-review skill, loads phases just-in-time, prints full outputs for transparency.