From agents
Analyze data and guide ML: EDA, model selection, feature engineering, stats, visualization, MLOps. Use for data work. NOT for ETL, database design (database-architect), or frontend viz code.
npx claudepluginhub wyattowalsh/agents --plugin agentsThis skill uses the workspace's default tool permissions.
Full-stack data science and ML engineering — from exploratory data analysis through model deployment strategy. Adapts approach based on complexity classification.
data/feature-engineering-patterns.jsondata/model-catalog.jsondata/statistical-tests-tree.jsondata/visualization-grammar.jsonevals/eda-mode.jsonevals/experiment-design.jsonevals/explicit-invocation.jsonevals/implicit-trigger.jsonevals/model-selection.jsonevals/negative-control.jsonevals/stats-mode.jsonreferences/data-quality.mdreferences/experiment-design.mdreferences/feature-engineering.mdreferences/mlops-maturity.mdreferences/model-selection.mdreferences/statistical-tests.mdscripts/data-profiler.pyscripts/data-quality-scorer.pyscripts/model-recommender.pyCreates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Full-stack data science and ML engineering — from exploratory data analysis through model deployment strategy. Adapts approach based on complexity classification.
| Term | Definition |
|---|---|
| EDA | Exploratory Data Analysis — systematic profiling and summarization of a dataset |
| feature | An individual measurable property used as input to a model |
| feature engineering | Creating, transforming, or selecting features to improve model performance |
| hypothesis test | A statistical procedure to determine if observed data supports a claim |
| p-value | Probability of observing data at least as extreme as the actual results, assuming the null hypothesis is true |
| effect size | Magnitude of a difference or relationship, independent of sample size |
| power analysis | Determining sample size needed to detect an effect of a given size |
| CUPED | Controlled-experiment Using Pre-Experiment Data — variance reduction technique for A/B tests |
| MLOps maturity | Level 0 (manual), Level 1 (ML pipeline), Level 2 (CI/CD + CT), Level 3 (full automation) |
| data quality score | Composite metric across completeness, consistency, accuracy, timeliness, uniqueness |
| profile | Statistical summary of a dataset: types, distributions, missing patterns, correlations |
| anomaly | Data point or pattern deviating significantly from expected behavior |
$ARGUMENTS | Action |
|---|---|
eda <data> | EDA — profile dataset, summary stats, missing patterns, distributions |
model <task> | Model Selection — recommend models, libraries, training plan for task |
features <data> | Feature Engineering — suggest transformations, encoding, selection pipeline |
stats <question> | Stats — select and design statistical hypothesis test |
viz <data> | Visualization — recommend chart types, encodings, layout for data |
experiment <hypothesis> | Experiment Design — A/B test design, power analysis, CUPED |
timeseries <data> | Time Series — forecasting approach, decomposition, model selection |
anomaly <data> | Anomaly Detection — detection approach, algorithm selection, threshold strategy |
mlops <model> | MLOps — serving strategy, deployment pipeline, monitoring plan |
| Natural language about data | Auto-detect — classify intent, route to appropriate mode |
| Empty | Gallery — show common data science tasks with mode recommendations |
If no mode keyword matches:
Present common data science tasks:
| # | Task | Mode | Example |
|---|---|---|---|
| 1 | Profile a dataset | eda | /data-wizard eda customer_data.csv |
| 2 | Choose a model | model | /data-wizard model "predict churn from usage features" |
| 3 | Engineer features | features | /data-wizard features sales_data.csv |
| 4 | Pick a stat test | stats | /data-wizard stats "is conversion rate different between groups?" |
| 5 | Choose visualizations | viz | /data-wizard viz time_series_metrics.csv |
| 6 | Design an experiment | experiment | /data-wizard experiment "new checkout flow increases conversion" |
| 7 | Forecast time series | timeseries | /data-wizard timeseries monthly_revenue.csv |
| 8 | Detect anomalies | anomaly | /data-wizard anomaly server_metrics.csv |
| 9 | Plan deployment | mlops | /data-wizard mlops "churn prediction model" |
Pick a number or describe your data science task.
Before starting, check if another skill is a better fit:
| Signal | Redirect |
|---|---|
| Database schema, SQL optimization, indexing | Suggest database-architect |
| Frontend dashboard code, React/D3 components | Suggest relevant frontend skill |
| Data pipeline, ETL, orchestration (Airflow, dbt) | Out of scope — suggest data engineering tools |
| Production infrastructure, Kubernetes, scaling | Suggest devops-engineer or infrastructure-coder |
Score the query on 4 dimensions (0-2 each, total 0-8):
| Dimension | 0 | 1 | 2 |
|---|---|---|---|
| Data complexity | Single table, clean | Multi-table, some nulls | Messy, multi-source, mixed types |
| Analysis depth | Descriptive stats | Inferential / predictive | Multi-stage pipeline, iteration |
| Domain specificity | General / well-known | Domain conventions apply | Deep domain expertise needed |
| Tooling breadth | Single library suffices | 2-3 libraries needed | Full ML stack integration |
| Total | Tier | Strategy |
|---|---|---|
| 0-2 | Quick | Single inline analysis — eda, viz, stats |
| 3-5 | Standard | Multi-step workflow — features, model, experiment, timeseries, anomaly |
| 6-8 | Full Pipeline | Orchestrated — mlops, complex multi-stage analysis |
Present the scoring to the user. User can override tier.
!uv run python skills/data-wizard/scripts/data-profiler.py "$1"!uv run python skills/data-wizard/scripts/model-recommender.py with task JSON inputreferences/model-selection.md for detailed guidance by data size and typereferences/feature-engineering.md for patterns by data typedata/feature-engineering-patterns.json for structured recommendations!uv run python skills/data-wizard/scripts/statistical-test-selector.py with question parametersdata/statistical-tests-tree.json for decision treereferences/statistical-tests.md for assumptions and interpretation guidancedata/visualization-grammar.json for chart type selectionreferences/experiment-design.md for A/B test patternsreferences/mlops-maturity.md for maturity modelRun: !uv run python skills/data-wizard/scripts/data-quality-scorer.py <path>
Dimensions scored:
| Dimension | Weight | Checks |
|---|---|---|
| Completeness | 25% | Missing values, null patterns |
| Consistency | 20% | Type uniformity, format violations |
| Accuracy | 20% | Range violations, statistical outliers |
| Timeliness | 15% | Stale records, temporal gaps |
| Uniqueness | 20% | Duplicates, near-duplicates |
| File | Content | Read When |
|---|---|---|
references/statistical-tests.md | Decision tree for test selection, assumptions, interpretation | Stats mode |
references/model-selection.md | Model catalog by task type, data size, interpretability needs | Model Selection mode |
references/feature-engineering.md | Patterns by data type: numeric, categorical, temporal, text, geospatial | Feature Engineering mode |
references/experiment-design.md | A/B test patterns, CUPED, power analysis, multiple comparison corrections | Experiment Design mode |
references/mlops-maturity.md | Maturity levels 0-3, deployment patterns, monitoring strategy | MLOps mode |
references/data-quality.md | Quality framework, scoring dimensions, remediation strategies | EDA mode, Data Quality Assessment |
Loading rule: Load ONE reference at a time per the "Read When" column. Do not preload.
Canonical terms (use these exactly throughout):