REQUIRED Phase 2 of /ds workflow. Profiles data and creates analysis task breakdown.
/plugin marketplace add edwinhu/workflows/plugin install workflows@edwinhu-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Announce: "Using ds-plan (Phase 2) to profile data and create task breakdown."
Profile the data and create an analysis plan based on the spec.
Requires .claude/SPEC.md from /ds-brainstorm first.
SPEC MUST EXIST BEFORE PLANNING. This is not negotiable.
Before exploring data or creating tasks, you MUST have:
.claude/SPEC.md with objectives and constraintsIf .claude/SPEC.md doesn't exist, run /ds-brainstorm first.
</EXTREMELY-IMPORTANT>
| Excuse | Reality | Do Instead |
|---|---|---|
| "Data looks clean, profiling unnecessary" | Your data is never clean | PROFILE to discover issues |
| "I can profile as I go" | You'll miss systemic issues | PROFILE comprehensively NOW |
| "Quick .head() is enough" | Your head hides tail problems | RUN full profiling checklist |
| "Missing values won't affect my analysis" | They always do | DOCUMENT and plan handling |
| "I'll handle data issues during analysis" | Your issues will derail your analysis | FIX data issues FIRST |
| "User didn't mention data quality" | They assume YOU'LL check | QUALITY check is YOUR job |
| "Profiling takes too long" | Your skipping it costs days later | INVEST time now |
Creating an analysis plan without profiling the data is LYING about understanding the data.
You cannot plan analysis steps without knowing:
Profiling costs you minutes. Your wrong plan costs hours of rework and incorrect results.
After writing .claude/PLAN.md, IMMEDIATELY invoke:
Skill(skill="workflows:ds-implement")
DO NOT:
The workflow phases are SEQUENTIAL. Complete plan → immediately start implement.
| DO | DON'T |
|---|---|
| Read .claude/SPEC.md | Skip brainstorm phase |
| Profile data (shape, types, stats) | Skip to analysis |
| Identify data quality issues | Ignore missing/duplicate data |
| Create ordered task list | Write final analysis code |
| Write .claude/PLAN.md | Make completion claims |
Brainstorm answers: WHAT and WHY Plan answers: HOW and DATA QUALITY
cat .claude/SPEC.md # verify-spec: read SPEC file to confirm it exists
If missing, stop and run /ds-brainstorm first.
For multiple data sources: Profile in parallel using background Task agents.
MANDATORY profiling steps:
import pandas as pd
# Basic structure
df.shape # (rows, columns)
df.dtypes # Column types
df.head(10) # Sample data
df.tail(5) # End of data
# Summary statistics
df.describe() # Numeric summaries
df.describe(include='object') # Categorical summaries
df.info() # Memory, non-null counts
# Data quality checks
df.isnull().sum() # Missing values per column
df.duplicated().sum() # Duplicate rows
df[col].value_counts() # Distribution of categories
# For time series
df[date_col].min(), df[date_col].max() # Date range
df.groupby(date_col).size() # Records per period
Use run_in_background: true for parallel execution.
When profiling 2+ data sources, launch agents in parallel: </EXTREMELY-IMPORTANT>
# PARALLEL + BACKGROUND: All Task calls in ONE message
Task(
subagent_type="general-purpose",
description="Profile dataset 1",
run_in_background=true,
prompt="""
Profile this dataset and return a data quality report.
Dataset: /path/to/dataset1.csv
Required checks:
1. Shape: rows x columns
2. Data types: df.dtypes
3. Missing values: df.isnull().sum()
4. Duplicates: df.duplicated().sum()
5. Summary statistics: df.describe()
6. Unique value counts for categorical columns
7. Date range if time series
8. Memory usage: df.info()
Output format:
- Markdown table with column summary
- List of data quality issues found
- Recommendations for cleaning
Tools denied: Write, Edit, NotebookEdit (read-only profiling)
""")
Task(
subagent_type="general-purpose",
description="Profile dataset 2",
run_in_background=true,
prompt="""
[Same template for dataset 2]
""")
Task(
subagent_type="general-purpose",
description="Profile dataset 3",
run_in_background=true,
prompt="""
[Same template for dataset 3]
""")
After launching agents:
/tasks command# Collect profiling results
TaskOutput(task_id="task-abc123", block=true, timeout=30000)
TaskOutput(task_id="task-def456", block=true, timeout=30000)
TaskOutput(task_id="task-ghi789", block=true, timeout=30000)
Benefits:
CRITICAL: Document ALL issues before proceeding:
| Check | What to Look For |
|---|---|
| Missing values | Null counts, patterns of missingness |
| Duplicates | Exact duplicates, key-based duplicates |
| Outliers | Extreme values, impossible values |
| Type issues | Strings in numeric columns, date parsing |
| Cardinality | Unexpected unique values |
| Distribution | Skewness, unexpected patterns |
Break analysis into ordered tasks:
Write to .claude/PLAN.md:
# Analysis Plan: [Analysis Name]
> **For Claude:** REQUIRED SUB-SKILL: Use `Skill(skill="workflows:ds-implement")` to implement this plan with output-first verification.
>
> **Delegation:** Main chat orchestrates, Task agents implement. Use `Skill(skill="workflows:ds-delegate")` for subagent templates.
## Spec Reference
See: .claude/SPEC.md
## Data Profile
### Source 1: [name]
- Location: [path/connection]
- Shape: [rows] x [columns]
- Date range: [start] to [end]
- Key columns: [list]
#### Column Summary
| Column | Type | Non-null | Unique | Notes |
|--------|------|----------|--------|-------|
| col1 | int64 | 100% | 50 | Primary key |
| col2 | object | 95% | 10 | Category |
#### Data Quality Issues
- [ ] Missing: col2 has 5% nulls - [strategy: drop/impute/flag]
- [ ] Duplicates: 100 duplicate rows on [key] - [strategy]
- [ ] Outliers: col3 has values > 1000 - [strategy]
### Source 2: [name]
[Same structure]
## Task Breakdown
### Task 1: Data Cleaning (required first)
- Handle missing values in col2
- Remove duplicates
- Fix data types
- Output: Clean DataFrame, log of rows removed
### Task 2: [Analysis Step]
- Input: Clean DataFrame
- Process: [description]
- Output: [specific output to verify]
- Dependencies: Task 1
### Task 3: [Next Step]
[Same structure]
## Output Verification Plan
For each task, define what output proves completion:
- Task 1: "X rows cleaned, Y rows dropped"
- Task 2: "Visualization showing [pattern]"
- Task 3: "Model accuracy >= 0.8"
## Reproducibility Requirements
- Random seed: [value if needed]
- Package versions: [key packages]
- Data snapshot: [date/version]
| Action | Why It's Wrong | Do Instead |
|---|---|---|
| Skip data profiling | Your data issues will break your analysis | Always profile first |
| Ignore missing values | You'll corrupt your results | Document and plan handling |
| Start analysis immediately | You haven't characterized your data | Complete profiling |
| Assume your data is clean | Never assume, you must verify | Run quality checks |
Complete the plan when:
.claude/SPEC.md.claude/PLAN.mdREQUIRED SUB-SKILL: After completing plan, IMMEDIATELY invoke:
Skill(skill="workflows:ds-implement")
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.