From data-analyst
Comprehensive data analysis expert covering statistical insights, visualization, and machine learning
npx claudepluginhub dobachi/claude-skills-marketplace --plugin data-analystThis skill uses the workspace's default tool permissions.
> **Language:** Respond in the user's language. If unclear, default to the language of the user's message.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Builds scalable data pipelines, modern data warehouses, and real-time streaming architectures using Spark, dbt, Airflow, Kafka, and cloud platforms like Snowflake, BigQuery.
Builds production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch job scheduling.
Language: Respond in the user's language. If unclear, default to the language of the user's message.
As a data analysis expert, extracts meaningful insights from data through CRISP-DM compliant systematic analysis to support decision-making.
| Phase | Key Tasks | Deliverables |
|---|---|---|
| Business Understanding | Goal setting, success criteria, constraint identification | Analysis requirements definition |
| Data Understanding | Data exploration, quality assessment, descriptive statistics | Data profile |
| Data Preparation | Cleansing, feature engineering | Analysis-ready dataset |
| Modeling | Method selection, model building, validation | Analysis model |
| Evaluation | Result verification, business value | Evaluation report |
| Deployment | Implementation plan, monitoring | Utilization guide |
| Item | Approach |
|---|---|
| Missing Values | Delete/impute/predict |
| Outliers | Identify and handle with IQR filtering |
| Data Types | Consistency verification |
| Scaling | Normalization/standardization |
| Features | Create/select/transform |
Descriptive Analysis:
- Cross-tabulation
- Correlation analysis (Pearson)
- Time series analysis
Inferential Statistics:
- Hypothesis testing
- Confidence intervals
- Effect size
Predictive Analysis:
- Regression analysis
- Classification analysis
- Clustering
| Algorithm | Use Case | Strengths | Weaknesses |
|---|---|---|---|
| XGBoost/LightGBM | Structured data | Fast, interpretable | Limited nonlinearity |
| Transformer | NLP/CV/time series | High accuracy, versatile | High compute cost |
| CNN | Image recognition | Spatial feature extraction | Requires large data |
| RNN/LSTM | Sequential data | Time series patterns | Long-term dependency issues |
| Method | Use Case | Key Techniques |
|---|---|---|
| Clustering | Data grouping | K-means, DBSCAN |
| Dimensionality Reduction | Visualization | PCA, t-SNE, UMAP |
| Generative Models | Data generation | GAN, VAE, diffusion models |
Classification:
- Accuracy, precision, recall, F1
- AUC-ROC (caution with imbalanced data)
- Confusion matrix utilization
Regression:
- RMSE, MAE, R-squared
- Residual analysis
- Prediction intervals
Cross-Validation:
- Standard: K-Fold (5-10 splits)
- Time Series: Time Series Split
- Stratified: Stratified K-Fold
| Purpose | Appropriate Charts |
|---|---|
| Comparison | Bar charts, radar charts |
| Trends | Line charts, area charts |
| Composition | Pie charts, treemaps |
| Correlation | Scatter plots, heatmaps |
Executive Summary:
- Key insights (3-5 items)
- Recommended actions
- Expected impact
Detailed Analysis:
- Methodology
- Analysis process
- Technical details
Visuals:
- Dashboards
- Interactive elements
| Problem | Cause | Solution |
|---|---|---|
| Overfitting | Insufficient data | Regularization, data augmentation |
| Slow Training | Improper initialization | Learning rate adjustment, normalization |
| Out of Memory | Large batch size | Gradient accumulation, mixed precision |
| Drift | Data distribution change | Enhanced monitoring, retraining |