Master MLOps fundamentals - lifecycle, principles, tools, practices, and organizational adoption
Master MLOps fundamentals including lifecycle phases, core principles, and tool selection. Use this when you need to assess organizational MLOps maturity or design basic ML pipelines following best practices.
/plugin marketplace add pluginagentmarketplace/custom-plugin-mlops/plugin install custom-plugin-mlops@pluginagentmarketplace-mlopsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pyLearn: Master the foundations of Machine Learning Operations for production ML systems.
| Attribute | Value |
|---|---|
| Bonded Agent | 01-mlops-fundamentals |
| Difficulty | Beginner to Intermediate |
| Duration | 40 hours |
| Prerequisites | Basic ML concepts, Git |
After completing this skill, you will be able to:
┌─────────────────────────────────────────────────────────────────┐
│ ML LIFECYCLE PHASES │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Data │──▶│ Model │──▶│ Deploy │──▶│ Monitor │ │
│ │Collection│ │ Training │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ └──────────── Feedback Loop ◀──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Topics:
Exercises:
Core Principles (2024-2025 Best Practices):
| Principle | Description | Implementation |
|---|---|---|
| Automation | Reduce manual intervention | CI/CD pipelines, auto-retraining |
| Reproducibility | Same inputs → same outputs | Version everything, seed randomness |
| Testing | Validate at every stage | Data tests, model tests, integration tests |
| Monitoring | Observe production behavior | Drift detection, performance metrics |
| Iteration | Continuous improvement | Feedback loops, experimentation |
Exercises:
Tool Categories & Recommendations:
MLOPS_TOOLS = {
"experiment_tracking": {
"open_source": ["MLflow", "DVC"],
"managed": ["Weights & Biases", "Neptune", "Comet"]
},
"feature_stores": {
"open_source": ["Feast"],
"managed": ["Tecton", "Hopsworks", "Vertex Feature Store"]
},
"orchestration": {
"open_source": ["Airflow", "Prefect", "Dagster"],
"managed": ["Kubeflow", "Vertex Pipelines", "SageMaker Pipelines"]
},
"model_serving": {
"open_source": ["TorchServe", "BentoML", "Seldon"],
"managed": ["Triton", "SageMaker Endpoints", "Vertex Endpoints"]
},
"monitoring": {
"open_source": ["Evidently", "NannyML"],
"managed": ["WhyLabs", "Arize", "Fiddler"]
}
}
Exercises:
Design Patterns:
Training Pipeline Pattern
Data → Validate → Transform → Train → Evaluate → Register
Serving Pipeline Pattern
Request → Preprocess → Predict → Postprocess → Response
Monitoring Pattern
Production Data → Compare vs Reference → Detect Drift → Alert/Retrain
Exercises:
# templates/mlflow_setup.py
import mlflow
from mlflow.tracking import MlflowClient
def setup_experiment(
experiment_name: str,
tracking_uri: str = "sqlite:///mlflow.db"
) -> str:
"""Initialize MLflow experiment with best practices."""
mlflow.set_tracking_uri(tracking_uri)
# Create or get experiment
client = MlflowClient()
experiment = client.get_experiment_by_name(experiment_name)
if experiment is None:
experiment_id = client.create_experiment(
name=experiment_name,
tags={"version": "1.0", "team": "ml-platform"}
)
else:
experiment_id = experiment.experiment_id
mlflow.set_experiment(experiment_name)
return experiment_id
def log_run(params: dict, metrics: dict, model_path: str):
"""Log a complete training run."""
with mlflow.start_run():
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.log_artifact(model_path)
# templates/maturity_assessment.py
from dataclasses import dataclass
from enum import IntEnum
from typing import List
class MaturityLevel(IntEnum):
AD_HOC = 0
REPEATABLE = 1
RELIABLE = 2
SCALABLE = 3
OPTIMIZED = 4
@dataclass
class AssessmentQuestion:
dimension: str
question: str
level_0: str
level_1: str
level_2: str
level_3: str
level_4: str
ASSESSMENT_QUESTIONS = [
AssessmentQuestion(
dimension="Data Management",
question="How do you manage training data?",
level_0="Ad-hoc file storage",
level_1="Versioned with DVC/Git LFS",
level_2="Data validation in place",
level_3="Feature store implemented",
level_4="Automated data quality monitoring"
),
# Add more questions for each dimension
]
def calculate_maturity_score(responses: List[int]) -> tuple:
"""Calculate overall maturity score."""
avg_score = sum(responses) / len(responses)
level = MaturityLevel(int(avg_score))
return avg_score, level
| Issue | Symptom | Solution |
|---|---|---|
| MLflow UI not loading | Connection refused | Check tracking URI, start server |
| Experiment not found | Experiment doesn't exist | Verify experiment name, create if needed |
| Artifact upload fails | Storage permission denied | Check artifact location permissions |
| Model registration fails | Model name conflict | Use unique names or versioning |
□ 1. Verify MLflow server is running
□ 2. Check tracking URI configuration
□ 3. Confirm artifact storage accessible
□ 4. Validate experiment exists
□ 5. Test with minimal example first
To mark this skill as complete, you must:
| Version | Date | Changes |
|---|---|---|
| 2.0.0 | 2024-12 | Production-grade upgrade with exercises and templates |
| 1.0.0 | 2024-11 | Initial release |
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.