ML/AI system design — model lifecycle, feature stores, experiment tracking, model serving, MLOps pipelines. Use when the user asks to "design an ML system", "architect model serving", "set up experiment tracking", "design feature store", "plan MLOps pipeline", or mentions model registry, A/B testing, drift detection, or retraining triggers.
From pmnpx claudepluginhub javimontano/mao-pm-apexThis skill is limited to using the following tools:
examples/README.mdexamples/sample-output.htmlexamples/sample-output.mdprompts/metaprompts.mdprompts/use-case-prompts.mdreferences/body-of-knowledge.mdreferences/knowledge-graph.mmdreferences/ml-system-patterns.mdreferences/state-of-the-art.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Data science architecture defines how machine learning systems are structured end-to-end — from feature engineering through model training, serving, monitoring, and governance. This skill produces ML system documentation that enables teams to build reproducible, scalable, and responsible AI systems.
Un modelo en un notebook no es un producto. Un modelo en producción con monitoring sí lo es. La arquitectura de data science diseña el ciclo completo: desde feature engineering hasta model serving, monitoring de drift, y governance. MLOps no es DevOps para ML — es la disciplina que convierte experimentos en productos confiables.
The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts.
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 topology + S4 serving + S6 governance) | técnica (full 6 sections, default)Before generating architecture, detect the codebase context:
!find . -name "*.py" -o -name "*.ipynb" -o -name "*.yaml" -o -name "*.toml" -o -name "Dockerfile" | head -30
Use detected frameworks (PyTorch, TensorFlow, scikit-learn, MLflow, Kubeflow, etc.) to tailor recommendations.
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/ml-system-patterns.md
Maps the overall ML system structure — training vs serving paths, online vs batch inference, data flow.
Includes:
MLOps maturity level (assess before designing):
Key decisions:
Defines how features are computed, stored, served, and reused across models.
Feature store comparison — select based on ecosystem:
| Criterion | Feast | Tecton | Hopsworks |
|---|---|---|---|
| Best for | Open-source, cloud-agnostic | Real-time features at scale | End-to-end ML platform |
| Online store | Redis, DynamoDB, Bigtable | Managed low-latency store | RonDB (sub-ms) |
| Offline store | BigQuery, Snowflake, Redshift, S3 | S3/Spark-based | Hive/S3 |
| Streaming transforms | Limited (requires external) | Native Spark/Flink | Native Spark/Flink |
| Cost model | Free + infra costs | Per-feature-read pricing | License + infra |
| Choose when | Budget-constrained, <50 features online | High-throughput real-time serving | Want unified ML platform |
Includes:
Key decisions:
Documents how experiments are tracked, models versioned, and lineage maintained.
Experiment tracking tool comparison:
| Criterion | MLflow | W&B (Weights & Biases) | Vertex AI Experiments |
|---|---|---|---|
| Best for | Open-source, self-hosted control | Collaboration, visualization | GCP-native teams |
| Hosting | Self-managed or Databricks | SaaS (cloud-hosted) | Managed on GCP |
| Strengths | Model registry, broad integrations | Real-time collab, sweep search, report sharing | Tight Vertex pipeline integration |
| Weaknesses | UI less polished, scaling effort | Vendor lock-in, cost at scale | GCP-only, less flexible |
| Cost model | Free (OSS) + infra | Free tier → $50/user/mo (Team) | Pay-per-use GCP pricing |
| Choose when | Multi-cloud, Databricks shop | Research-heavy teams needing rich comparison UI | Already on GCP Vertex |
Includes:
Key decisions:
Designs how models are deployed, scaled, and managed in production.
Model monitoring stack — select by need:
| Tool | Focus | Best For | Integration |
|---|---|---|---|
| Evidently | Data/model drift, test suites | OSS teams, CI/CD integration | Python lib, dashboards, Grafana |
| Arize | Production monitoring, embeddings | Real-time debugging, LLM observability | SDK, auto-instrumentation |
| WhyLabs | Continuous profiling, anomaly detection | High-volume streaming, privacy-preserving | whylogs agent, lightweight |
| Fiddler | Explainability + monitoring | Regulated industries, model cards | REST API, notebook SDK |
Selection criteria: choose Evidently for budget-conscious OSS; Arize for real-time + LLM workloads; WhyLabs for high-throughput with minimal overhead; Fiddler when explainability is a regulatory requirement.
Includes:
Key decisions:
Defines CI/CD for ML — automated training, testing, deployment, monitoring, and retraining.
GPU cost optimization formulas:
Includes:
Key decisions:
Establishes guardrails for bias detection, explainability, audit trails, and compliance.
Responsible AI checklist (mandatory for production models):
Includes:
Key decisions:
| Decision | Enables | Constrains | Threshold |
|---|---|---|---|
| Centralized Feature Store | Feature reuse, consistency | Team autonomy, single point of failure | 3+ models sharing features |
| Real-Time Serving | Low latency, interactive UX | Infrastructure cost, complexity | Sub-second SLA required |
| Batch Inference | Cost efficiency, simpler infra | Stale predictions | Minutes-to-hours latency acceptable |
| Automated Retraining | Fresh models, reduced toil | Compute cost, bad deployment risk | Stable pipelines with drift monitoring |
| Shadow Deployment | Risk-free validation | Double compute cost | High-risk models, regulatory requirements |
| Model Ensemble | Higher accuracy, robustness | Latency, debugging difficulty | Single-model accuracy insufficient |
Notebook-to-Production Migration: Most ML starts in notebooks. Define the bridge: refactor into modular code, add tests, containerize, integrate with CI/CD. Provide incremental adoption path — expect resistance.
Multi-Model Systems: When multiple models interact (ensemble, pipeline, routing), address dependency management, versioning across models, and cascade failure scenarios. Pin model versions for reproducibility.
Edge/On-Device Deployment: Constraints: model size (<50MB typical), quantization (INT8/TFLite), OTA update mechanisms, offline capability. Architecture must address fallback when connectivity is lost.
Regulated Industries (Healthcare, Finance): Audit trail, explainability, and bias detection are mandatory from day one. Model cards, decision logging, human override mechanisms are not optional.
Cold Start / No Historical Data: Feature stores and training pipelines assume historical data. Include bootstrapping strategies, rule-based fallbacks, and progressive learning with feedback loops.
LLM/Foundation Model Integration: When wrapping or fine-tuning LLMs, address: prompt versioning, evaluation frameworks (human + automated), cost per inference tracking, guardrails for hallucination and toxicity, RAG pipeline architecture if applicable.
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | ✅ | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter.
Primary: A-01_Data_Science_Architecture.html — ML system topology, feature store design, experiment tracking, model serving architecture, MLOps pipeline, governance framework.
Secondary: Model cards (.md), feature registry catalog, MLOps pipeline DAG diagram, bias audit report template.
| Caso | Estrategia de Manejo |
|---|---|
| Notebook-to-production migration | Refactor en codigo modular, agregar tests, containerizar, integrar con CI/CD. Path de adopcion incremental. Esperar resistencia del equipo. |
| Multi-model systems (ensemble, pipeline, routing) | Dependency management entre modelos, versionado cross-model, cascade failure scenarios. Pinear model versions para reproducibilidad. |
| Edge/on-device deployment | Constraints: model size <50MB, quantization (INT8/TFLite), OTA updates, offline capability. Fallback cuando no hay conectividad. |
| Industrias reguladas (salud, finanzas) | Audit trail, explainability y bias detection mandatorios desde dia 1. Model cards, decision logging, human override no son opcionales. |
| Cold start / no historical data | Feature stores y training pipelines asumen datos historicos. Bootstrapping con reglas, fallbacks rule-based, progressive learning con feedback loops. |
| LLM/Foundation model integration | Prompt versioning, evaluation frameworks (human + automated), cost per inference tracking, guardrails para hallucination y toxicity, RAG pipeline si aplica. |
| Decision | Alternativa Descartada | Justificacion |
|---|---|---|
| Feature store como single source of truth | Features calculados por modelo, duplicados entre equipos | Features compartidas, versionadas, con lineage previenen training-serving skew y eliminan re-trabajo. 3+ modelos compartiendo features justifica la inversion. |
| Monitoring > accuracy como principio | Optimizar accuracy primero, monitoring despues | Un modelo con 95% accuracy que driftea sin deteccion es peor que 85% monitoreado. Drift detection es dia 1, no dia N. |
| Responsible AI checklist mandatoria (7 items) | Governance opcional o post-launch | Model card, bias metrics, explainability, data audit, human override, audit trail, y compliance mapping son pre-requisitos de produccion, no nice-to-haves. |
| MLflow para multi-cloud, W&B para research-heavy | Single experiment tracking tool para todos | MLflow (OSS, multi-cloud, broad integrations) para produccion. W&B (rich collaboration, sweep search) para equipos research-heavy. Diferentes equipos pueden usar diferentes tools. |
graph TD
subgraph Core["Conceptos Core"]
TOPO["ML System Topology"]
FEATURE["Feature Engineering & Store"]
EXPERIMENT["Experiment Tracking"]
SERVING["Model Serving"]
MLOPS["MLOps Pipeline"]
GOVERNANCE["Governance & Responsible AI"]
end
subgraph Inputs["Entradas"]
DATA["Training Data"]
USECASES["ML Use Cases"]
INFRA["GPU/CPU Infrastructure"]
REQS["Latency & SLA Requirements"]
end
subgraph Outputs["Salidas"]
ARCH["ML System Architecture"]
MODELCARD["Model Cards"]
REGISTRY["Feature Registry Catalog"]
DAGDIAG["MLOps Pipeline DAG"]
BIAS["Bias Audit Report Template"]
end
subgraph Related["Skills Relacionados"]
DE["data-engineering"]
BIARCH["bi-architecture"]
SWARCH["software-architecture"]
INFRAARCH["infrastructure-architecture"]
end
DATA --> FEATURE
USECASES --> TOPO
INFRA --> SERVING
REQS --> SERVING
TOPO --> FEATURE
FEATURE --> EXPERIMENT
EXPERIMENT --> SERVING
SERVING --> MLOPS
MLOPS --> GOVERNANCE
ARCH --> MODELCARD
ARCH --> REGISTRY
ARCH --> DAGDIAG
GOVERNANCE --> BIAS
DE -.-> FEATURE
BIARCH -.-> SERVING
SWARCH -.-> TOPO
INFRAARCH -.-> SERVING
Formato Markdown (default):
# Data Science Architecture: {project}
## S1: ML System Topology
### MLOps Maturity Level: {0|1|2|3}
### Training Pipeline Topology (Mermaid)
### Serving Topology
## S2: Feature Engineering & Store Design
### Feature Store Selection: {Feast|Tecton|Hopsworks}
### Feature Pipeline Architecture
### Training-Serving Consistency Strategy
## S3: Experiment Tracking & Model Registry
### Tool Selection: {MLflow|W&B|Vertex}
### Promotion Workflow (Staging > Production)
## S4: Model Serving Architecture
### Serving Pattern: {real-time|batch|streaming}
### A/B Testing Infrastructure
### Monitoring Stack Selection
## S5: MLOps Pipeline Design
### CI/CD Stages for ML
### Retraining Triggers
### GPU Cost Optimization
## S6: Governance & Responsible AI
### Responsible AI Checklist (7 items)
### Model Risk Classification
Formato HTML (bajo demanda):
A-01_Data_Science_Architecture_{project}_{WIP}.html
HTML self-contained branded (Design System MetodologIA v5). Light-First Technical. Incluye MLOps maturity level visual, feature store comparison matrix interactiva, y responsible AI checklist con estado por item. WCAG AA, responsive, print-ready.
Formato DOCX (bajo demanda):
1. Executive Summary — ML system maturity + key architectural decisions
2. ML System Topology — training & serving paths, data flow diagram
3. Feature Store Design — offline/online stores, registry, consistency strategy
4. Experiment Tracking — tool selection, versioning, reproducibility framework
5. Model Serving — deployment patterns, A/B testing, monitoring stack
6. MLOps Pipeline — CI/CD stages, retraining triggers, cost optimization
7. Responsible AI Governance — checklist, bias metrics, explainability, compliance
Appendix A: Model Card Template
Appendix B: Bias Audit Report Template
Appendix C: GPU Cost Optimization Formulas
Formato XLSX (bajo demanda):
{fase}_Data_Science_Architecture_{cliente}_{WIP}.xlsxFormato PPTX (bajo demanda):
{fase}_Data_Science_Architecture_{cliente}_{WIP}.pptx| Dimension | Peso | Criterio |
|---|---|---|
| Trigger Accuracy | 10% | Activacion correcta ante keywords de ML system, model serving, experiment tracking, feature store, MLOps, drift detection, retraining triggers. |
| Completeness | 25% | 6 secciones cubren topology, features, experiments, serving, MLOps, y governance. Responsible AI checklist con 7 items mandatorios. |
| Clarity | 20% | Comparison matrices (feature stores, experiment trackers, monitoring tools) con criterios de seleccion claros. MLOps maturity levels 0-3 definidos. |
| Robustness | 20% | Edge cases (notebook-to-prod, multi-model, edge deployment, regulated, cold start, LLM) manejados con estrategias especificas. |
| Efficiency | 10% | Variante ejecutiva reduce a S1+S4+S6 (~40%). GPU cost optimization con formulas concretas (spot, quantization, cascade routing). |
| Value Density | 15% | Responsible AI checklist accionable. Feature store comparison per ecosystem. GPU cost formulas con savings estimates. Model card template incluido. |
Umbral minimo: 7/10. Debajo de este umbral, revisar feature store design y responsible AI checklist completeness.
Autor: Javier Montano · Comunidad MetodologIA | Ultima actualizacion: 15 de marzo de 2026