Experiment Tracking Skill

Learn: Master ML experiment tracking for reproducibility and collaboration.

Skill Overview

Attribute	Value
Bonded Agent	02-experiment-tracking
Difficulty	Intermediate
Duration	30 hours
Prerequisites	mlops-basics

Learning Objectives

Set up experiment tracking infrastructure
Log parameters, metrics, and artifacts systematically
Compare experiments and identify best models
Use model registry for version management
Collaborate with team using shared tracking

Topics Covered

Module 1: Platform Setup (6 hours)

Platform Comparison:

Feature	MLflow	W&B	Neptune
Self-hosted	✅	❌	❌
Free tier	✅	✅	✅
Real-time	❌	✅	✅
Git integration	⚠️	✅	✅

Setup Exercises:

Install MLflow and start local server
Create W&B account and initialize project
Compare UI/UX of both platforms

Module 2: Experiment Logging (10 hours)

What to Log:

# Complete logging example
with mlflow.start_run():
    # 1. Parameters (hyperparameters, configs)
    mlflow.log_params({
        "learning_rate": 0.001,
        "batch_size": 32,
        "model_type": "transformer"
    })

    # 2. Metrics (per-step and final)
    for epoch in range(10):
        mlflow.log_metrics({
            "train_loss": train_loss,
            "val_loss": val_loss
        }, step=epoch)

    # 3. Artifacts (models, plots, configs)
    mlflow.log_artifact("confusion_matrix.png")
    mlflow.pytorch.log_model(model, "model")

    # 4. Tags (for filtering)
    mlflow.set_tags({
        "experiment_type": "baseline",
        "dataset_version": "v2.1"
    })

Module 3: Model Registry (8 hours)

Registry Workflow:

┌─────────────────────────────────────────────────────────────┐
│                    MODEL REGISTRY FLOW                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Train → Log Model → Register → Staging → Production → Archive
│                          │          │           │              │
│                          ▼          ▼           ▼              │
│                     Version 1   Validate    Deploy           │
│                     Version 2   A/B Test    Monitor          │
│                     Version N   Approve     Rollback         │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Exercises:

Register a trained model
Promote model through stages
Implement rollback procedure

Module 4: Best Practices (6 hours)

Naming Conventions:

experiments/
├── {project_name}/
│   ├── {experiment_type}_{date}/
│   │   ├── run_{config_hash}/

Reproducibility Checklist:

Code Templates

Template: Production Experiment Tracker

# templates/experiment_tracker.py
import mlflow
import hashlib
import subprocess
from datetime import datetime

class ProductionExperimentTracker:
    """Production-ready experiment tracking wrapper."""

    def __init__(self, experiment_name: str, tracking_uri: str):
        mlflow.set_tracking_uri(tracking_uri)
        mlflow.set_experiment(experiment_name)
        self.run = None

    def start_run(self, run_name: str = None):
        """Start a new tracked run."""
        self.run = mlflow.start_run(run_name=run_name)

        # Auto-log environment info
        self._log_environment()
        return self

    def _log_environment(self):
        """Capture reproducibility information."""
        # Git info
        try:
            git_hash = subprocess.check_output(
                ["git", "rev-parse", "HEAD"]
            ).decode().strip()
            mlflow.set_tag("git_commit", git_hash)
        except:
            pass

        # Timestamp
        mlflow.set_tag("run_timestamp", datetime.now().isoformat())

    def log_config(self, config: dict):
        """Log configuration as parameters."""
        # Flatten nested config
        flat_config = self._flatten_dict(config)
        mlflow.log_params(flat_config)

    def log_metrics(self, metrics: dict, step: int = None):
        """Log metrics with optional step."""
        mlflow.log_metrics(metrics, step=step)

    def log_model(self, model, artifact_path: str = "model"):
        """Log model with signature."""
        mlflow.pytorch.log_model(model, artifact_path)

    def end_run(self):
        """End the current run."""
        if self.run:
            mlflow.end_run()

    def _flatten_dict(self, d: dict, parent_key: str = '') -> dict:
        """Flatten nested dictionary."""
        items = []
        for k, v in d.items():
            new_key = f"{parent_key}.{k}" if parent_key else k
            if isinstance(v, dict):
                items.extend(self._flatten_dict(v, new_key).items())
            else:
                items.append((new_key, v))
        return dict(items)

Troubleshooting Guide

Issue	Cause	Solution
Runs not syncing	Network issue	Check connectivity, use offline mode
Large artifacts fail	Size limit	Use cloud storage for large files
Duplicate run names	No uniqueness	Add timestamp or hash to names

Resources

MLflow Documentation
W&B Documentation
[See: training-pipelines] - Integrate tracking with pipelines

Version History

Version	Date	Changes
2.0.0	2024-12	Production-grade with templates
1.0.0	2024-11	Initial release

experiment-tracking

Experiment Tracking Skill

Skill Overview

Learning Objectives

Topics Covered

Module 1: Platform Setup (6 hours)

Module 2: Experiment Logging (10 hours)

Module 3: Model Registry (8 hours)

Module 4: Best Practices (6 hours)

Code Templates

Template: Production Experiment Tracker

Troubleshooting Guide

Resources

Version History

Similar Skills