This skill should be used when the user asks to "design an ML system", "architect model serving", "set up experiment tracking", "design feature store", "plan MLOps pipeline", or mentions model registry, A/B testing, drift detection, or retraining triggers. It produces end-to-end ML/AI system architecture covering model lifecycle, feature stores, experiment tracking, model serving, and MLOps pipelines. Use this skill whenever the conversation involves machine learning infrastructure or productionizing models, even if they don't explicitly ask for "data science architecture". [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill is limited to using the following tools:
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/ml-system-patterns.mdtemplates/output.docx.mdtemplates/output.htmlData science architecture defines how machine learning systems are structured end-to-end — from feature engineering through model training, serving, monitoring, and governance. This skill produces ML system documentation that enables teams to build reproducible, scalable, and responsible AI systems. [EXPLICIT]
Un modelo en un notebook no es un producto. Un modelo en producción con monitoring sí lo es. La arquitectura de data science diseña el ciclo completo: desde feature engineering hasta model serving, monitoring de drift, y governance. MLOps no es DevOps para ML — es la disciplina que convierte experimentos en productos confiables.
The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts. [EXPLICIT]
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 topology + S4 serving + S6 governance) | técnica (full 6 sections, default)Before generating architecture, detect the codebase context:
!find . -name "*.py" -o -name "*.ipynb" -o -name "*.yaml" -o -name "*.toml" -o -name "Dockerfile" | head -30
Use detected frameworks (PyTorch, TensorFlow, scikit-learn, MLflow, Kubeflow, etc.) to tailor recommendations. [EXPLICIT]
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/ml-system-patterns.md
Maps the overall ML system structure — training vs serving paths, online vs batch inference, data flow. [EXPLICIT]
Includes:
MLOps maturity level (assess before designing):
Key decisions:
Defines how features are computed, stored, served, and reused across models. [EXPLICIT]
Feature store comparison — select based on ecosystem:
| Criterion | Feast | Tecton | Hopsworks |
|---|---|---|---|
| Best for | Open-source, cloud-agnostic | Real-time features at scale | End-to-end ML platform |
| Online store | Redis, DynamoDB, Bigtable | Managed low-latency store | RonDB (sub-ms) |
| Offline store | BigQuery, Snowflake, Redshift, S3 | S3/Spark-based | Hive/S3 |
| Streaming transforms | Limited (requires external) | Native Spark/Flink | Native Spark/Flink |
| Cost model | Free + infra costs | Per-feature-read pricing | License + infra |
| Choose when | Budget-constrained, <50 features online | High-throughput real-time serving | Want unified ML platform |
Includes:
Key decisions:
Documents how experiments are tracked, models versioned, and lineage maintained. [EXPLICIT]
Experiment tracking tool comparison:
| Criterion | MLflow | W&B (Weights & Biases) | Vertex AI Experiments |
|---|---|---|---|
| Best for | Open-source, self-hosted control | Collaboration, visualization | GCP-native teams |
| Hosting | Self-managed or Databricks | SaaS (cloud-hosted) | Managed on GCP |
| Strengths | Model registry, broad integrations | Real-time collab, sweep search, report sharing | Tight Vertex pipeline integration |
| Weaknesses | UI less polished, scaling effort | Vendor lock-in, cost at scale | GCP-only, less flexible |
| Cost model | Free (OSS) + infra | Free tier → $50/user/mo (Team) | Pay-per-use GCP pricing |
| Choose when | Multi-cloud, Databricks shop | Research-heavy teams needing rich comparison UI | Already on GCP Vertex |
Includes:
Key decisions:
Designs how models are deployed, scaled, and managed in production. [EXPLICIT]
Model monitoring stack — select by need:
| Tool | Focus | Best For | Integration |
|---|---|---|---|
| Evidently | Data/model drift, test suites | OSS teams, CI/CD integration | Python lib, dashboards, Grafana |
| Arize | Production monitoring, embeddings | Real-time debugging, LLM observability | SDK, auto-instrumentation |
| WhyLabs | Continuous profiling, anomaly detection | High-volume streaming, privacy-preserving | whylogs agent, lightweight |
| Fiddler | Explainability + monitoring | Regulated industries, model cards | REST API, notebook SDK |
Selection criteria: choose Evidently for budget-conscious OSS; Arize for real-time + LLM workloads; WhyLabs for high-throughput with minimal overhead; Fiddler when explainability is a regulatory requirement. [EXPLICIT]
Includes:
Key decisions:
Defines CI/CD for ML — automated training, testing, deployment, monitoring, and retraining. [EXPLICIT]
GPU cost optimization formulas:
Includes:
Key decisions:
Establishes guardrails for bias detection, explainability, audit trails, and compliance. [EXPLICIT]
Responsible AI checklist (mandatory for production models):
Includes:
Key decisions:
| Decision | Enables | Constrains | Threshold |
|---|---|---|---|
| Centralized Feature Store | Feature reuse, consistency | Team autonomy, single point of failure | 3+ models sharing features |
| Real-Time Serving | Low latency, interactive UX | Infrastructure cost, complexity | Sub-second SLA required |
| Batch Inference | Cost efficiency, simpler infra | Stale predictions | Minutes-to-hours latency acceptable |
| Automated Retraining | Fresh models, reduced toil | Compute cost, bad deployment risk | Stable pipelines with drift monitoring |
| Shadow Deployment | Risk-free validation | Double compute cost | High-risk models, regulatory requirements |
| Model Ensemble | Higher accuracy, robustness | Latency, debugging difficulty | Single-model accuracy insufficient |
Notebook-to-Production Migration: Most ML starts in notebooks. Define the bridge: refactor into modular code, add tests, containerize, integrate with CI/CD. Provide incremental adoption path — expect resistance. [EXPLICIT]
Multi-Model Systems: When multiple models interact (ensemble, pipeline, routing), address dependency management, versioning across models, and cascade failure scenarios. Pin model versions for reproducibility. [EXPLICIT]
Edge/On-Device Deployment: Constraints: model size (<50MB typical), quantization (INT8/TFLite), OTA update mechanisms, offline capability. Architecture must address fallback when connectivity is lost. [EXPLICIT]
Regulated Industries (Healthcare, Finance): Audit trail, explainability, and bias detection are mandatory from day one. Model cards, decision logging, human override mechanisms are not optional. [EXPLICIT]
Cold Start / No Historical Data: Feature stores and training pipelines assume historical data. Include bootstrapping strategies, rule-based fallbacks, and progressive learning with feedback loops. [EXPLICIT]
LLM/Foundation Model Integration: When wrapping or fine-tuning LLMs, address: prompt versioning, evaluation frameworks (human + automated), cost per inference tracking, guardrails for hallucination and toxicity, RAG pipeline architecture if applicable. [EXPLICIT]
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | ✅ | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter. [EXPLICIT]
Primary: A-01_Data_Science_Architecture.html — ML system topology, feature store design, experiment tracking, model serving architecture, MLOps pipeline, governance framework.
Secondary: Model cards (.md), feature registry catalog, MLOps pipeline DAG diagram, bias audit report template.
Author: Javier Montano | Last updated: March 18, 2026
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.