MLOps Basics Skill

Learn: Master the foundations of Machine Learning Operations for production ML systems.

Skill Overview

Attribute	Value
Bonded Agent	01-mlops-fundamentals
Difficulty	Beginner to Intermediate
Duration	40 hours
Prerequisites	Basic ML concepts, Git

Learning Objectives

After completing this skill, you will be able to:

Explain the complete ML lifecycle from data to production
Assess organizational MLOps maturity levels (0-4)
Compare and select appropriate MLOps tools
Design basic ML pipelines following best practices
Implement foundational MLOps practices in your team

Topics Covered

Module 1: ML Lifecycle Fundamentals (8 hours)

┌─────────────────────────────────────────────────────────────────┐
│                     ML LIFECYCLE PHASES                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐     │
│  │   Data   │──▶│  Model   │──▶│  Deploy  │──▶│ Monitor  │     │
│  │Collection│   │ Training │   │          │   │          │     │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘     │
│       │                                              │          │
│       └──────────── Feedback Loop ◀──────────────────┘          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Topics:

Data Engineering vs Data Science vs ML Engineering
Feature Engineering lifecycle
Model training and evaluation
Deployment patterns
Continuous monitoring and retraining

Exercises:

Map an existing ML project to lifecycle phases
Identify gaps in current workflow
Create lifecycle documentation template

Module 2: MLOps Principles (8 hours)

Core Principles (2024-2025 Best Practices):

Principle	Description	Implementation
Automation	Reduce manual intervention	CI/CD pipelines, auto-retraining
Reproducibility	Same inputs → same outputs	Version everything, seed randomness
Testing	Validate at every stage	Data tests, model tests, integration tests
Monitoring	Observe production behavior	Drift detection, performance metrics
Iteration	Continuous improvement	Feedback loops, experimentation

Exercises:

Audit current project against principles
Create reproducibility checklist
Design testing strategy

Module 3: Tool Ecosystem (12 hours)

Tool Categories & Recommendations:

MLOPS_TOOLS = {
    "experiment_tracking": {
        "open_source": ["MLflow", "DVC"],
        "managed": ["Weights & Biases", "Neptune", "Comet"]
    },
    "feature_stores": {
        "open_source": ["Feast"],
        "managed": ["Tecton", "Hopsworks", "Vertex Feature Store"]
    },
    "orchestration": {
        "open_source": ["Airflow", "Prefect", "Dagster"],
        "managed": ["Kubeflow", "Vertex Pipelines", "SageMaker Pipelines"]
    },
    "model_serving": {
        "open_source": ["TorchServe", "BentoML", "Seldon"],
        "managed": ["Triton", "SageMaker Endpoints", "Vertex Endpoints"]
    },
    "monitoring": {
        "open_source": ["Evidently", "NannyML"],
        "managed": ["WhyLabs", "Arize", "Fiddler"]
    }
}

Exercises:

Install and configure MLflow locally
Track an experiment with parameters and metrics
Register a model in the model registry

Module 4: Best Practices & Patterns (12 hours)

Design Patterns:

Training Pipeline Pattern

Data → Validate → Transform → Train → Evaluate → Register

Serving Pipeline Pattern

Request → Preprocess → Predict → Postprocess → Response

Monitoring Pattern

Production Data → Compare vs Reference → Detect Drift → Alert/Retrain

Exercises:

Design a training pipeline for a sample project
Implement basic data validation with Great Expectations
Set up a simple monitoring dashboard

Code Templates

Template 1: MLflow Experiment Setup

# templates/mlflow_setup.py
import mlflow
from mlflow.tracking import MlflowClient

def setup_experiment(
    experiment_name: str,
    tracking_uri: str = "sqlite:///mlflow.db"
) -> str:
    """Initialize MLflow experiment with best practices."""
    mlflow.set_tracking_uri(tracking_uri)

    # Create or get experiment
    client = MlflowClient()
    experiment = client.get_experiment_by_name(experiment_name)

    if experiment is None:
        experiment_id = client.create_experiment(
            name=experiment_name,
            tags={"version": "1.0", "team": "ml-platform"}
        )
    else:
        experiment_id = experiment.experiment_id

    mlflow.set_experiment(experiment_name)
    return experiment_id


def log_run(params: dict, metrics: dict, model_path: str):
    """Log a complete training run."""
    with mlflow.start_run():
        mlflow.log_params(params)
        mlflow.log_metrics(metrics)
        mlflow.log_artifact(model_path)

Template 2: MLOps Maturity Assessment

# templates/maturity_assessment.py
from dataclasses import dataclass
from enum import IntEnum
from typing import List

class MaturityLevel(IntEnum):
    AD_HOC = 0
    REPEATABLE = 1
    RELIABLE = 2
    SCALABLE = 3
    OPTIMIZED = 4

@dataclass
class AssessmentQuestion:
    dimension: str
    question: str
    level_0: str
    level_1: str
    level_2: str
    level_3: str
    level_4: str

ASSESSMENT_QUESTIONS = [
    AssessmentQuestion(
        dimension="Data Management",
        question="How do you manage training data?",
        level_0="Ad-hoc file storage",
        level_1="Versioned with DVC/Git LFS",
        level_2="Data validation in place",
        level_3="Feature store implemented",
        level_4="Automated data quality monitoring"
    ),
    # Add more questions for each dimension
]

def calculate_maturity_score(responses: List[int]) -> tuple:
    """Calculate overall maturity score."""
    avg_score = sum(responses) / len(responses)
    level = MaturityLevel(int(avg_score))
    return avg_score, level

Troubleshooting Guide

Common Issues

Issue	Symptom	Solution
MLflow UI not loading	Connection refused	Check tracking URI, start server
Experiment not found	Experiment doesn't exist	Verify experiment name, create if needed
Artifact upload fails	Storage permission denied	Check artifact location permissions
Model registration fails	Model name conflict	Use unique names or versioning

Debug Checklist

□ 1. Verify MLflow server is running
□ 2. Check tracking URI configuration
□ 3. Confirm artifact storage accessible
□ 4. Validate experiment exists
□ 5. Test with minimal example first

Knowledge Check

Self-Assessment Questions

What are the 5 main phases of the ML lifecycle?
How does MLOps differ from DevOps?
When would you use MLflow vs Weights & Biases?
What's the difference between Level 1 and Level 2 MLOps maturity?
What should you version in an ML project?

Practical Exercises

Beginner: Set up MLflow tracking for an existing notebook
Intermediate: Create a reproducible training script with environment capture
Advanced: Design an end-to-end pipeline diagram for a real use case

Resources

Official Documentation

Related Skills

[See: experiment-tracking] - Deep dive into experiment tracking
[See: training-pipelines] - Build production training pipelines
[See: ml-infrastructure] - Set up ML infrastructure

Skill Completion Criteria

To mark this skill as complete, you must:

Complete all 4 modules
Finish at least 80% of exercises
Pass the knowledge check (4/5 correct)
Complete one practical project

Version History

Version	Date	Changes
2.0.0	2024-12	Production-grade upgrade with exercises and templates
1.0.0	2024-11	Initial release

mlops-basics

MLOps Basics Skill

Skill Overview

Learning Objectives

Topics Covered

Module 1: ML Lifecycle Fundamentals (8 hours)

Module 2: MLOps Principles (8 hours)

Module 3: Tool Ecosystem (12 hours)

Module 4: Best Practices & Patterns (12 hours)

Code Templates

Template 1: MLflow Experiment Setup

Template 2: MLOps Maturity Assessment

Troubleshooting Guide

Common Issues

Debug Checklist

Knowledge Check

Self-Assessment Questions

Practical Exercises

Resources

Official Documentation

Recommended Reading

Related Skills

Skill Completion Criteria

Version History

Similar Skills