MLflow, model versioning, experiment tracking, model registry, and production ML systems
Tracks ML experiments and manages model lifecycle with MLflow, including versioning, registry, and deployment pipelines. Use when training models, promoting to production, or monitoring for drift.
/plugin marketplace add pluginagentmarketplace/custom-plugin-data-engineer/plugin install data-engineer-development-assistant@pluginagentmarketplace-data-engineerThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyProduction machine learning systems with MLflow, model versioning, and deployment pipelines.
import mlflow
from mlflow.tracking import MlflowClient
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
import joblib
# Configure MLflow
mlflow.set_tracking_uri("http://mlflow-server:5000")
mlflow.set_experiment("customer-churn-prediction")
# Training with experiment tracking
with mlflow.start_run(run_name="rf-baseline"):
# Log parameters
params = {"n_estimators": 100, "max_depth": 10, "random_state": 42}
mlflow.log_params(params)
# Train model
model = RandomForestClassifier(**params)
model.fit(X_train, y_train)
# Evaluate and log metrics
y_pred = model.predict(X_test)
metrics = {
"accuracy": accuracy_score(y_test, y_pred),
"f1_score": f1_score(y_test, y_pred, average="weighted")
}
mlflow.log_metrics(metrics)
# Log model to registry
mlflow.sklearn.log_model(
model, "model",
registered_model_name="churn-classifier",
signature=mlflow.models.infer_signature(X_train, y_pred)
)
print(f"Run ID: {mlflow.active_run().info.run_id}")
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Promote model to production
client.transition_model_version_stage(
name="churn-classifier",
version=3,
stage="Production"
)
# Archive old version
client.transition_model_version_stage(
name="churn-classifier",
version=2,
stage="Archived"
)
# Load production model
model_uri = "models:/churn-classifier/Production"
model = mlflow.sklearn.load_model(model_uri)
# Model comparison
def compare_model_versions(model_name: str, versions: list[int]) -> dict:
results = {}
for version in versions:
run_id = client.get_model_version(model_name, str(version)).run_id
run = client.get_run(run_id)
results[version] = run.data.metrics
return results
from feast import FeatureStore, Entity, Feature, FeatureView, FileSource
from datetime import timedelta
# Define feature store
store = FeatureStore(repo_path="feature_repo/")
# Get training features
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"customer_features:total_purchases",
"customer_features:days_since_last_order",
"customer_features:avg_order_value"
]
).to_df()
# Get online features for inference
feature_vector = store.get_online_features(
features=[
"customer_features:total_purchases",
"customer_features:days_since_last_order"
],
entity_rows=[{"customer_id": "12345"}]
).to_dict()
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import mlflow
import numpy as np
app = FastAPI()
# Load model at startup
model = mlflow.sklearn.load_model("models:/churn-classifier/Production")
class PredictionRequest(BaseModel):
features: list[float]
class PredictionResponse(BaseModel):
prediction: int
probability: float
model_version: str
@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
try:
X = np.array(request.features).reshape(1, -1)
prediction = model.predict(X)[0]
probability = model.predict_proba(X)[0].max()
return PredictionResponse(
prediction=int(prediction),
probability=float(probability),
model_version="v3"
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/health")
async def health():
return {"status": "healthy", "model_loaded": model is not None}
# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on:
push:
paths:
- 'src/**'
- 'data/**'
jobs:
train-and-evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Run tests
run: pytest tests/
- name: Train model
env:
MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_URI }}
run: python src/train.py
- name: Evaluate model
run: python src/evaluate.py --threshold 0.85
- name: Register model
if: success()
run: python src/register_model.py
deploy:
needs: train-and-evaluate
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Deploy to production
run: |
kubectl set image deployment/model-server \
model-server=gcr.io/$PROJECT/model:${{ github.sha }}
| Tool | Purpose | Version (2025) |
|---|---|---|
| MLflow | Experiment tracking | 2.10+ |
| Feast | Feature store | 0.36+ |
| BentoML | Model serving | 1.2+ |
| Seldon | K8s model serving | 1.17+ |
| DVC | Data versioning | 3.40+ |
| Weights & Biases | Experiment tracking | Latest |
| Evidently | Model monitoring | 0.4+ |
| Issue | Symptoms | Root Cause | Fix |
|---|---|---|---|
| Model Drift | Accuracy drops | Data distribution change | Monitor, retrain |
| Slow Inference | High latency | Large model, no optimization | Quantize, distill |
| Version Mismatch | Prediction errors | Wrong model version | Pin versions |
| Feature Skew | Train/serve mismatch | Different preprocessing | Use feature store |
# ✅ DO: Version everything
mlflow.log_artifact("data/train.csv")
mlflow.log_params({"data_version": "v2.3"})
# ✅ DO: Test model before deployment
def test_model_performance(model, threshold=0.85):
score = evaluate_model(model)
assert score >= threshold, f"Model score {score} below threshold"
# ✅ DO: Monitor in production
# ✅ DO: A/B test new models
# ❌ DON'T: Deploy without validation
# ❌ DON'T: Skip rollback strategy
Skill Certification Checklist:
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.