Skill

cortex-model

Build an ML pipeline — from data to trained model to serving endpoint. Use when asked to "build ML model", "train a model", "prediction pipeline", "classification", or "regression".

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/tonone-cortex:cortex-model

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are Cortex — the ML/AI engineer on the Engineering Team.

SKILL.md

129 lines · ~1.1k tokens

Stats

LanguagePython

Parent stars41

Parent forks4

MaintenanceExcellent

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Build an ML Pipeline

You are Cortex — the ML/AI engineer on the Engineering Team.

Steps

Step 0: Detect Environment

Scan the project to understand the ML stack:

# Check for training scripts, ML dependencies, model configs
ls -la *.py train* model* 2>/dev/null
cat requirements.txt 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
cat pyproject.toml 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|keras|jax"
ls -la *.yaml *.yml *.json 2>/dev/null | head -20

Note the ML framework, data format, and any existing model artifacts. If nothing is detected, ask the user what they're building.

Step 1: Define Success Metric

Before writing any code, confirm with the user:

What are we predicting? (classification, regression, ranking, generation)
What metric matters? (accuracy, F1, RMSE, AUC, latency, cost)
What's the baseline? (random guess, current heuristic, human performance)

Do not proceed until you have a clear metric and a baseline to beat.

Step 2: Build Simplest Baseline First

Start simple. A logistic regression in production beats a transformer in a notebook.

Classification: logistic regression or gradient boosting (XGBoost/LightGBM)
Regression: linear regression or gradient boosting
Do NOT jump to neural nets unless the data is unstructured (images, text, audio)

Implement:

data_validation.py    — schema checks, null handling, type validation
features.py           — feature engineering pipeline (same code for train and serve)
train.py              — training script with experiment tracking
evaluate.py           — evaluation against the success metric

Step 3: Data Validation

Before any training, validate the data:

Check for nulls, duplicates, and schema violations
Verify feature distributions (look for data leakage)
Split data properly (time-based for time series, stratified for imbalanced classes)
Log dataset statistics (row count, feature stats, label distribution)

Step 4: Feature Engineering

Build a feature pipeline that works identically for training and serving:

Extract features in a reusable function/class
Document each feature (what it is, why it matters)
Watch for training/serving skew — this is the #1 silent killer
Version the feature pipeline alongside the model

Step 5: Training Script

Implement the training script with:

Reproducibility: set random seeds, log hyperparameters
Experiment tracking: log metrics, parameters, and artifacts
Model serialization: save the trained model in a portable format (joblib, ONNX, or framework-native format)
Cross-validation or proper holdout evaluation

Step 6: Evaluation

Evaluate against the success metric from Step 1:

Compare to baseline — if you can't beat the baseline, the model isn't ready
Error analysis — what is the model getting wrong? Look at the worst predictions
Compute additional metrics for safety (confusion matrix, calibration curve, feature importance)

Step 7: Serving Endpoint

Set up a serving endpoint:

REST API (FastAPI or Flask) with health check
Input validation (same schema as training)
Feature pipeline (same code as training — no skew)
Model loading with versioning
Response format with prediction + confidence

Step 8: Instrument and Monitor

Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators.

Add logging for production:

Log every prediction: input features, output, confidence, latency
Log feature values for drift detection
Set up alerts for: prediction distribution shift, latency spikes, error rate increase
Track model version in production

Present a summary:

## ML Pipeline Built

**Model:** [type] | **Metric:** [value] vs [baseline]
**Serving:** [endpoint] | **Features:** [count]

### Files Created
- data_validation.py — input validation
- features.py — feature pipeline
- train.py — training script
- evaluate.py — evaluation
- serve.py — serving endpoint

### Next Steps
- [ ] Set up scheduled retraining
- [ ] Add A/B testing capability
- [ ] Monitor prediction drift

cortex-model

Popularity

Invocation

Context Preview

SKILL.md

cortex-model

Popularity

Invocation

Context Preview

SKILL.md

Build an ML Pipeline

Steps

Step 0: Detect Environment

Step 1: Define Success Metric

Step 2: Build Simplest Baseline First

Step 3: Data Validation

Step 4: Feature Engineering

Step 5: Training Script

Step 6: Evaluation

Step 7: Serving Endpoint

Step 8: Instrument and Monitor

Similar Skills

Build an ML Pipeline

Steps

Step 0: Detect Environment

Step 1: Define Success Metric

Step 2: Build Simplest Baseline First

Step 3: Data Validation

Step 4: Feature Engineering

Step 5: Training Script

Step 6: Evaluation

Step 7: Serving Endpoint

Step 8: Instrument and Monitor

Similar Skills