Skill

ai-design-patterns

AI-specific design patterns and system tactics — Feature Store, Champion-Challenger, Shadow Deployment, Drift Detection, Explainability Wrapper, Canary Deployment, Bulkhead, and traditional patterns adapted for AI. This skill should be used when the user asks to "select AI design patterns", "apply ML patterns", "design drift detection", "implement feature store", "plan shadow deployment", "design champion-challenger", "select availability tactics for AI", or mentions AI anti-patterns, maintainability tactics, fault recovery for models, or pattern selection for ML systems. [EXPLICIT]

From jm-adk

Install

Run in your terminal

npx claudepluginhub javimontano/jm-adk-alfa

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBash

Supporting Assets

View in Repository

agents/guardian.md

agents/lead.md

agents/specialist.md

agents/support.md

evals/evals.json

knowledge/body-of-knowledge.md

knowledge/knowledge-graph.md

prompts/meta.md

prompts/primary.md

prompts/variations/deep.md

prompts/variations/quick.md

references/ai-patterns-detail.md

references/anti-patterns.md

references/tactics-catalog.md

templates/output.docx.md

templates/output.html

Skill Content

AI Design Patterns: Patterns & Tactics for AI-Enabled Systems

AI design patterns define reusable solutions to recurring architectural problems in AI systems. This skill produces a pattern selection analysis covering maintainability tactics, availability tactics, AI-specific patterns (Feature Store, Champion-Challenger, Shadow Deployment, Drift Detection), traditional patterns adapted for AI, anti-pattern detection, and a decision framework that maps system requirements to recommended patterns. [EXPLICIT]

Principio Rector

Los patrones son decisiones arquitectónicas, no decoraciones. Cada patrón seleccionado tiene un costo (complejidad, infraestructura, operaciones) y un beneficio (resiliencia, mantenibilidad, seguridad). Seleccionar un patrón sin justificación es tan peligroso como no tener ningún patrón.

Filosofía de Design Patterns para IA

Tácticas primero, patrones después. Las tácticas (detección de fallas, recuperación, prevención) son los bloques constituyentes. Los patrones combinan tácticas en soluciones cohesivas. Seleccionar un patrón sin entender las tácticas subyacentes produce implementaciones frágiles. [EXPLICIT]
Anti-patrones como guía negativa. Conocer los anti-patrones (Training-Serving Skew, YOLO Deploy, Silent Degradation) es tan valioso como conocer los patrones. La detección de anti-patrones en el sistema existente informa qué patrones aplicar. [EXPLICIT]
El contexto determina el patrón, no la moda. Feature Store no es necesario para un modelo único con features simples. Champion-Challenger no aplica si solo hay un modelo. Shadow Deployment no justifica su costo para modelos de bajo riesgo. Cada patrón debe justificarse por las necesidades del sistema. [EXPLICIT]

Inputs

The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts. [EXPLICIT]

Parameters:

{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual
{ALCANCE}: ejecutiva (~40% — S3 AI patterns + S6 decision framework) | tecnica (full 6 sections, default)

Before generating pattern analysis, detect the codebase context:

Detección automática de contexto:
  Escanear el codebase por frameworks ML (PyTorch, TensorFlow, scikit-learn),
  orquestadores (Airflow, Dagster, Prefect, Kubeflow), y serving frameworks
  (TensorFlow Serving, TorchServe, Triton, vLLM) para adaptar recomendaciones. [EXPLICIT]

Detect existing patterns (feature stores, A/B testing, monitoring, model serving, drift detection). [EXPLICIT]

Load references:

Read ${CLAUDE_SKILL_DIR}/references/ai-patterns-detail.md
Read ${CLAUDE_SKILL_DIR}/references/tactics-catalog.md
Read ${CLAUDE_SKILL_DIR}/references/anti-patterns.md

When to Use

Selecting design patterns for new AI system architecture
Evaluating existing AI system against known patterns and anti-patterns
Designing fault detection and recovery for AI systems (drift, model failure, data quality)
Implementing feature store, champion-challenger, or shadow deployment
Defining maintainability and availability tactics for AI pipelines
Planning migration from ad-hoc ML code to pattern-based architecture
Reviewing anti-patterns and proposing remediation strategies

When NOT to Use

Internal module structure and layer architecture -> metodologia-ai-software-architecture
CONOPS and operational concept -> metodologia-ai-conops
Pipeline design and CI/CD -> metodologia-ai-pipeline-architecture
Testing strategy -> metodologia-ai-testing-strategy
GenAI/LLM-specific patterns (RAG, agents) -> metodologia-genai-architecture
Traditional software patterns without AI context -> metodologia-software-architecture

Delivery Structure: 6 Sections

S1: Maintainability Tactics for AI

Identifies tactics that enable the AI system to evolve, be tested, and be configured without fragility. [EXPLICIT]

Tactic categories:

Modifiability: Module decomposition, configuration externalization, abstract model interfaces, dependency injection for data sources and model versions
Testability: Observability hooks per pipeline stage, reproducible environments, data versioning, A/B testing infrastructure
Adaptability: Plugin architecture for models and features, feature toggles, multi-model support
Configurability: DAG-based pipeline orchestration, threshold management, configurable logging granularity

Key decisions:

Which tactics are mandatory vs. aspirational for the system's maturity level
Tactic priority ordering (what to implement first given constraints)
Tactic-to-component mapping (which architectural component realizes each tactic)

S2: Availability Tactics for AI

Identifies tactics that ensure the AI system detects, recovers from, and prevents failures. [EXPLICIT]

Fault detection tactics:

Model performance monitoring (accuracy, latency, throughput)
Data quality monitoring (schema, distribution, anomalies)
Drift detection (data drift, concept drift, prediction drift)
Pipeline health checks and resource monitoring

Fault recovery tactics:

Model rollback via registry
Prediction caching for model unavailability
Human-in-the-loop fallback
Pipeline checkpoint/restart
Circuit breaker for failing models

Fault prevention tactics:

Adversarial testing
Chaos engineering for ML
Canary analysis automation
Data validation gates at pipeline entry

Key decisions:

Recovery time objectives per component
Fallback hierarchy (cached prediction -> previous model -> human -> graceful denial)
Monitoring granularity vs. performance overhead

S3: AI-Specific Patterns Catalog

Catalogs the patterns purpose-built for AI system challenges. [EXPLICIT]

Patterns:

Feature Store: Centralized feature computation, training-serving consistency, feature reuse
Champion-Challenger: Evidence-based model promotion via live traffic comparison
Shadow Deployment: Risk-free model validation using production traffic
Drift Detection: Continuous distributional monitoring with automated response
Explainability Wrapper: Explanation generation alongside predictions for transparency and compliance
Canary Deployment: Gradual traffic shift with metric-driven rollback
Bulkhead: Fault isolation between pipeline components and model instances
Model Distillation: Compress large model into smaller, faster model preserving key capabilities
Prompt Caching: Store and reuse prompt-response pairs for repeated or similar queries
Guardrail Pattern: Input/output validation layer between user and model (content safety, PII, topic control)

For each pattern: intent, structure, key decisions, and trade-offs. Detail in references/ai-patterns-detail.md. [EXPLICIT]

Key decisions:

Which patterns are required vs. optional for the system
Pattern interactions and dependencies (Feature Store enables Champion-Challenger)
Implementation priority based on risk and value

S4: Traditional Patterns Adapted for AI

Maps traditional software patterns to AI-specific applications. [EXPLICIT]

Service-oriented:

Model-as-a-Service: Independent model deployment with versioned API
Feature-as-a-Service: Reusable feature computation endpoints
Explanation-as-a-Service: Centralized explanation generation

Load management:

Balancer: Distribute inference across model replicas
Throttle: Rate-limit inference to prevent overload
Batch Inference: Group requests for GPU efficiency

Resilience:

Fail-and-Repeat: Retry failed predictions with backoff
Circuit Breaker: Open circuit after consecutive model failures
Bulkhead: Isolate per-tenant or per-priority model instances

Consensus:

N-Party Voting: Ensemble multiple models for robustness
Confidence Thresholding: Serve only high-confidence predictions; escalate low-confidence

Key decisions:

Which traditional patterns need AI-specific adaptation vs. use as-is
Pattern composition (combining multiple patterns for compound behavior)

S5: Anti-Pattern Detection for AI Systems

Identifies known anti-patterns, detects their presence, and prescribes remediation. [EXPLICIT]

Anti-patterns:

Training-Serving Skew: Feature computation divergence between training and production
YOLO Deploy: Direct-to-production model deployment without staging
Notebook-to-Production: Copy-paste from Jupyter to production code
Monolithic Pipeline: Single deployment unit for entire ML pipeline
Silent Model Degradation: No drift monitoring, gradual accuracy loss undetected
Feature Sprawl: Hundreds of unmanaged, redundant features
Alert Fatigue: Too many false-positive ML alerts, team ignores them
God Model: Single model handling all use cases
Compliance Afterthought: Explainability and fairness added post-hoc
Unguarded LLM: LLM exposed without input/output validation, prone to injection and misuse
Token Budget Blindness: No cost controls on LLM calls, unbounded agent loops, surprise bills

For each: symptom, cause, detection signal, and recommended pattern fix. [EXPLICIT]

Key decisions:

Which anti-patterns are present in the current system
Remediation priority (risk-based ordering)
Pattern-to-anti-pattern mapping (which pattern fixes which anti-pattern)

S6: Pattern Selection Decision Framework

Provides a systematic approach for selecting patterns based on system requirements. [EXPLICIT]

Selection dimensions:

Risk level: High-risk systems need more patterns (Explainability, Drift Detection, N-Party Voting)
Scale: High-volume systems need load management patterns (Balancer, Throttle, Batch)
Model count: Multi-model systems need Champion-Challenger, Feature Store, Bulkhead
Regulatory: Regulated systems need Explainability Wrapper, audit trails, governance patterns
Maturity: Early-stage systems start with core patterns; mature systems add optimization patterns

Pattern dependency graph:

Feature Store -> Champion-Challenger -> Canary Deployment
Drift Detection -> Model Rollback -> Circuit Breaker
Explainability Wrapper -> Audit Trail -> Compliance
Shadow Deployment -> Canary Deployment -> Blue & Gold CI/CD

Implementation roadmap template:

Phase 1 (Foundation): Feature Store, Model Registry, basic monitoring
Phase 2 (Resilience): Drift Detection, Circuit Breaker, Bulkhead, Rollback
Phase 3 (Optimization): Champion-Challenger, Shadow Deployment, Canary
Phase 4 (Governance): Explainability Wrapper, N-Party Voting, compliance patterns

Trade-off Matrix

Pattern	Enables	Constrains	When to Use
Feature Store	Consistency, reuse, drift monitoring	Infrastructure overhead, governance cost	Multiple models sharing features
Champion-Challenger	Evidence-based promotion	Traffic split complexity, experiment duration	Multiple model candidates, sufficient traffic
Shadow Deployment	Risk-free validation with real traffic	Doubled infrastructure cost, no user feedback	High-risk model changes
Drift Detection	Early degradation warning	False positive alerts, monitoring overhead	All production AI systems
Explainability Wrapper	Transparency, compliance, trust	Latency overhead, explanation fidelity	Regulated domains, high-stakes decisions
Canary Deployment	Gradual risk mitigation	Slower deployment, routing complexity	All model deployments
Bulkhead	Fault isolation, independent scaling	Resource overhead, operational complexity	Multi-model or multi-tenant systems
N-Party Voting	Robustness, reduced variance	Latency, compute cost, model diversity	High-stakes or adversarial environments
Model Distillation	Lower latency, reduced cost	Quality loss, training effort	High-volume inference with latency constraints
Prompt Caching	Cost reduction, latency improvement	Cache invalidation, storage cost	High-volume systems with repeated query patterns
Guardrail Pattern	Safety, compliance, brand protection	Added latency, false positives	All production GenAI systems

Assumptions

System has or will build AI pipeline infrastructure (not running ad-hoc scripts)
Team understands the distinction between model development and model operations
Infrastructure supports the compute requirements for pattern implementation (e.g., shadow deployment doubles compute)
Monitoring and observability budget is allocated
Model registry exists or is planned (many patterns depend on it)

Limits

Focuses on patterns and tactics, not pipeline design (see metodologia-ai-pipeline-architecture)
Does not design internal module structure (see metodologia-ai-software-architecture)
Does not define testing strategy beyond pattern-specific validation (see metodologia-ai-testing-strategy)
GenAI-specific patterns (RAG, agent orchestration) require metodologia-genai-architecture
Does not cover infrastructure provisioning for patterns (see metodologia-infrastructure-architecture)

Edge Cases

Early-Stage System with One Model: Most patterns are overkill for a single-model MVP. Start with Drift Detection and basic monitoring. Add Feature Store only when a second model needs shared features. Add Champion-Challenger only when there are model alternatives to compare. [EXPLICIT]

Real-Time vs. Batch Pattern Selection: Some patterns behave differently in real-time vs. batch contexts. Champion-Challenger in real-time splits live traffic; in batch, it runs parallel jobs on the same dataset. Feature Store online store is for real-time; offline store is for batch. Ensure pattern documentation specifies both modes. [EXPLICIT]

Multi-Team Pattern Governance: Different teams may implement the same pattern differently (e.g., each team builds their own drift detection). Establish shared pattern implementations as platform capabilities. Feature Store and Model Registry are infrastructure-level patterns, not team-level. [EXPLICIT]

Regulated Environment Pattern Requirements: In finance and healthcare, Explainability Wrapper and audit trails are mandatory, not optional. Drift Detection thresholds may be set by regulators. Champion-Challenger experiments may require ethics board approval. Document regulatory pattern mandates separately from optional patterns. [EXPLICIT]

Manejo de Inputs Ambiguos

Si el nombre del sistema no se proporciona: solicitar antes de proceder.
Si el MODO no se especifica: usar piloto-auto (default).
Si el contexto es insuficiente para una sección: documentar como "[Requiere input adicional: {descripción}]" en lugar de inventar contenido.
Si no hay codebase para detectar anti-patrones: basar el análisis en la descripción del sistema y documentar que la detección requiere acceso al código para confirmación.
Si el sistema es demasiado temprano para patrones avanzados: recomendar solo patrones fundamentales (Drift Detection, Model Registry) y documentar cuándo activar los demás.

Validation Gate

Before finalizing delivery, verify:

El agente que ejecuta este skill debe verificar cada item antes de entregar el output al usuario.

Cross-References

metodologia-ai-software-architecture: Provides module structure context; patterns apply within modules
metodologia-ai-conops: Interaction level drives pattern selection (higher autonomy = more patterns needed)
metodologia-ai-pipeline-architecture: Patterns operate within pipeline stages; Feature Store is a pipeline component
metodologia-ai-testing-strategy: Tests validate pattern behavior (drift detection accuracy, rollback speed)
metodologia-genai-architecture: GenAI-specific patterns (RAG, agents) complement these general AI patterns
metodologia-aws-architecture-design: AWS-native pattern implementations (Bedrock Guardrails, Bedrock Agents, SageMaker A/B)
metodologia-infrastructure-architecture: Infrastructure supports pattern deployment (compute, storage, networking)
metodologia-devsecops-architecture: Security tactics complement availability tactics

Output Format Protocol

Format	Default	Description
`markdown`	Yes	Rich Markdown + Mermaid diagrams. Token-efficient.
`html`	On demand	Branded HTML (Design System). Visual impact.
`dual`	On demand	Both formats.

Output Artifact

Primary: A-03_AI_Design_Patterns_Deep.html — Maintainability tactics matrix, availability tactics matrix, AI patterns catalog with diagrams, traditional patterns adapted, anti-pattern detection results, pattern selection decision framework.

Secondary: Pattern decision records (.md), anti-pattern detection checklist, pattern dependency graph (Mermaid/PNG/SVG), implementation roadmap.

Fuente: Avila, R.D. & Ahmad, I. (2025). Architecting AI Software Systems. Packt.

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.5k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.5k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

everything-claude-code

138.0k

Stats

Stars1

Forks0

Last CommitMar 28, 2026

Actions

View Source View Plugin View on GitHub View README

AI Design Patterns: Patterns & Tactics for AI-Enabled Systems

Principio Rector

Filosofía de Design Patterns para IA

Tácticas primero, patrones después. Las tácticas (detección de fallas, recuperación, prevención) son los bloques constituyentes. Los patrones combinan tácticas en soluciones cohesivas. Seleccionar un patrón sin entender las tácticas subyacentes produce implementaciones frágiles. [EXPLICIT]
Anti-patrones como guía negativa. Conocer los anti-patrones (Training-Serving Skew, YOLO Deploy, Silent Degradation) es tan valioso como conocer los patrones. La detección de anti-patrones en el sistema existente informa qué patrones aplicar. [EXPLICIT]
El contexto determina el patrón, no la moda. Feature Store no es necesario para un modelo único con features simples. Champion-Challenger no aplica si solo hay un modelo. Shadow Deployment no justifica su costo para modelos de bajo riesgo. Cada patrón debe justificarse por las necesidades del sistema. [EXPLICIT]

Inputs

The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts. [EXPLICIT]

Parameters:

{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual
{ALCANCE}: ejecutiva (~40% — S3 AI patterns + S6 decision framework) | tecnica (full 6 sections, default)

Before generating pattern analysis, detect the codebase context:

Detección automática de contexto:
  Escanear el codebase por frameworks ML (PyTorch, TensorFlow, scikit-learn),
  orquestadores (Airflow, Dagster, Prefect, Kubeflow), y serving frameworks
  (TensorFlow Serving, TorchServe, Triton, vLLM) para adaptar recomendaciones. [EXPLICIT]

Detect existing patterns (feature stores, A/B testing, monitoring, model serving, drift detection). [EXPLICIT]

Load references:

Read ${CLAUDE_SKILL_DIR}/references/ai-patterns-detail.md
Read ${CLAUDE_SKILL_DIR}/references/tactics-catalog.md
Read ${CLAUDE_SKILL_DIR}/references/anti-patterns.md

When to Use

Selecting design patterns for new AI system architecture
Evaluating existing AI system against known patterns and anti-patterns
Designing fault detection and recovery for AI systems (drift, model failure, data quality)
Implementing feature store, champion-challenger, or shadow deployment
Defining maintainability and availability tactics for AI pipelines
Planning migration from ad-hoc ML code to pattern-based architecture
Reviewing anti-patterns and proposing remediation strategies

When NOT to Use

Internal module structure and layer architecture -> metodologia-ai-software-architecture
CONOPS and operational concept -> metodologia-ai-conops
Pipeline design and CI/CD -> metodologia-ai-pipeline-architecture
Testing strategy -> metodologia-ai-testing-strategy
GenAI/LLM-specific patterns (RAG, agents) -> metodologia-genai-architecture
Traditional software patterns without AI context -> metodologia-software-architecture

Delivery Structure: 6 Sections

S1: Maintainability Tactics for AI

Identifies tactics that enable the AI system to evolve, be tested, and be configured without fragility. [EXPLICIT]

Tactic categories:

Modifiability: Module decomposition, configuration externalization, abstract model interfaces, dependency injection for data sources and model versions
Testability: Observability hooks per pipeline stage, reproducible environments, data versioning, A/B testing infrastructure
Adaptability: Plugin architecture for models and features, feature toggles, multi-model support
Configurability: DAG-based pipeline orchestration, threshold management, configurable logging granularity

Key decisions:

Which tactics are mandatory vs. aspirational for the system's maturity level
Tactic priority ordering (what to implement first given constraints)
Tactic-to-component mapping (which architectural component realizes each tactic)

S2: Availability Tactics for AI

Identifies tactics that ensure the AI system detects, recovers from, and prevents failures. [EXPLICIT]

Fault detection tactics:

Model performance monitoring (accuracy, latency, throughput)
Data quality monitoring (schema, distribution, anomalies)
Drift detection (data drift, concept drift, prediction drift)
Pipeline health checks and resource monitoring

Fault recovery tactics:

Model rollback via registry
Prediction caching for model unavailability
Human-in-the-loop fallback
Pipeline checkpoint/restart
Circuit breaker for failing models

Fault prevention tactics:

Adversarial testing
Chaos engineering for ML
Canary analysis automation
Data validation gates at pipeline entry

Key decisions:

Recovery time objectives per component
Fallback hierarchy (cached prediction -> previous model -> human -> graceful denial)
Monitoring granularity vs. performance overhead

S3: AI-Specific Patterns Catalog

Catalogs the patterns purpose-built for AI system challenges. [EXPLICIT]

Patterns:

Feature Store: Centralized feature computation, training-serving consistency, feature reuse
Champion-Challenger: Evidence-based model promotion via live traffic comparison
Shadow Deployment: Risk-free model validation using production traffic
Drift Detection: Continuous distributional monitoring with automated response
Explainability Wrapper: Explanation generation alongside predictions for transparency and compliance
Canary Deployment: Gradual traffic shift with metric-driven rollback
Bulkhead: Fault isolation between pipeline components and model instances
Model Distillation: Compress large model into smaller, faster model preserving key capabilities
Prompt Caching: Store and reuse prompt-response pairs for repeated or similar queries
Guardrail Pattern: Input/output validation layer between user and model (content safety, PII, topic control)

For each pattern: intent, structure, key decisions, and trade-offs. Detail in references/ai-patterns-detail.md. [EXPLICIT]

Key decisions:

Which patterns are required vs. optional for the system
Pattern interactions and dependencies (Feature Store enables Champion-Challenger)
Implementation priority based on risk and value

S4: Traditional Patterns Adapted for AI

Maps traditional software patterns to AI-specific applications. [EXPLICIT]

Service-oriented:

Model-as-a-Service: Independent model deployment with versioned API
Feature-as-a-Service: Reusable feature computation endpoints
Explanation-as-a-Service: Centralized explanation generation

Load management:

Balancer: Distribute inference across model replicas
Throttle: Rate-limit inference to prevent overload
Batch Inference: Group requests for GPU efficiency

Resilience:

Fail-and-Repeat: Retry failed predictions with backoff
Circuit Breaker: Open circuit after consecutive model failures
Bulkhead: Isolate per-tenant or per-priority model instances

Consensus:

N-Party Voting: Ensemble multiple models for robustness
Confidence Thresholding: Serve only high-confidence predictions; escalate low-confidence

Key decisions:

Which traditional patterns need AI-specific adaptation vs. use as-is
Pattern composition (combining multiple patterns for compound behavior)

S5: Anti-Pattern Detection for AI Systems

Identifies known anti-patterns, detects their presence, and prescribes remediation. [EXPLICIT]

Anti-patterns:

Training-Serving Skew: Feature computation divergence between training and production
YOLO Deploy: Direct-to-production model deployment without staging
Notebook-to-Production: Copy-paste from Jupyter to production code
Monolithic Pipeline: Single deployment unit for entire ML pipeline
Silent Model Degradation: No drift monitoring, gradual accuracy loss undetected
Feature Sprawl: Hundreds of unmanaged, redundant features
Alert Fatigue: Too many false-positive ML alerts, team ignores them
God Model: Single model handling all use cases
Compliance Afterthought: Explainability and fairness added post-hoc
Unguarded LLM: LLM exposed without input/output validation, prone to injection and misuse
Token Budget Blindness: No cost controls on LLM calls, unbounded agent loops, surprise bills

For each: symptom, cause, detection signal, and recommended pattern fix. [EXPLICIT]

Key decisions:

Which anti-patterns are present in the current system
Remediation priority (risk-based ordering)
Pattern-to-anti-pattern mapping (which pattern fixes which anti-pattern)

S6: Pattern Selection Decision Framework

Provides a systematic approach for selecting patterns based on system requirements. [EXPLICIT]

Selection dimensions:

Risk level: High-risk systems need more patterns (Explainability, Drift Detection, N-Party Voting)
Scale: High-volume systems need load management patterns (Balancer, Throttle, Batch)
Model count: Multi-model systems need Champion-Challenger, Feature Store, Bulkhead
Regulatory: Regulated systems need Explainability Wrapper, audit trails, governance patterns
Maturity: Early-stage systems start with core patterns; mature systems add optimization patterns

Pattern dependency graph:

Feature Store -> Champion-Challenger -> Canary Deployment
Drift Detection -> Model Rollback -> Circuit Breaker
Explainability Wrapper -> Audit Trail -> Compliance
Shadow Deployment -> Canary Deployment -> Blue & Gold CI/CD

Implementation roadmap template:

Phase 1 (Foundation): Feature Store, Model Registry, basic monitoring
Phase 2 (Resilience): Drift Detection, Circuit Breaker, Bulkhead, Rollback
Phase 3 (Optimization): Champion-Challenger, Shadow Deployment, Canary
Phase 4 (Governance): Explainability Wrapper, N-Party Voting, compliance patterns

Trade-off Matrix

Pattern	Enables	Constrains	When to Use
Feature Store	Consistency, reuse, drift monitoring	Infrastructure overhead, governance cost	Multiple models sharing features
Champion-Challenger	Evidence-based promotion	Traffic split complexity, experiment duration	Multiple model candidates, sufficient traffic
Shadow Deployment	Risk-free validation with real traffic	Doubled infrastructure cost, no user feedback	High-risk model changes
Drift Detection	Early degradation warning	False positive alerts, monitoring overhead	All production AI systems
Explainability Wrapper	Transparency, compliance, trust	Latency overhead, explanation fidelity	Regulated domains, high-stakes decisions
Canary Deployment	Gradual risk mitigation	Slower deployment, routing complexity	All model deployments
Bulkhead	Fault isolation, independent scaling	Resource overhead, operational complexity	Multi-model or multi-tenant systems
N-Party Voting	Robustness, reduced variance	Latency, compute cost, model diversity	High-stakes or adversarial environments
Model Distillation	Lower latency, reduced cost	Quality loss, training effort	High-volume inference with latency constraints
Prompt Caching	Cost reduction, latency improvement	Cache invalidation, storage cost	High-volume systems with repeated query patterns
Guardrail Pattern	Safety, compliance, brand protection	Added latency, false positives	All production GenAI systems

Assumptions

System has or will build AI pipeline infrastructure (not running ad-hoc scripts)
Team understands the distinction between model development and model operations
Infrastructure supports the compute requirements for pattern implementation (e.g., shadow deployment doubles compute)
Monitoring and observability budget is allocated
Model registry exists or is planned (many patterns depend on it)

Limits

Focuses on patterns and tactics, not pipeline design (see metodologia-ai-pipeline-architecture)
Does not design internal module structure (see metodologia-ai-software-architecture)
Does not define testing strategy beyond pattern-specific validation (see metodologia-ai-testing-strategy)
GenAI-specific patterns (RAG, agent orchestration) require metodologia-genai-architecture
Does not cover infrastructure provisioning for patterns (see metodologia-infrastructure-architecture)

Edge Cases

Manejo de Inputs Ambiguos

Si el nombre del sistema no se proporciona: solicitar antes de proceder.
Si el MODO no se especifica: usar piloto-auto (default).
Si el contexto es insuficiente para una sección: documentar como "[Requiere input adicional: {descripción}]" en lugar de inventar contenido.
Si no hay codebase para detectar anti-patrones: basar el análisis en la descripción del sistema y documentar que la detección requiere acceso al código para confirmación.
Si el sistema es demasiado temprano para patrones avanzados: recomendar solo patrones fundamentales (Drift Detection, Model Registry) y documentar cuándo activar los demás.

Validation Gate

Before finalizing delivery, verify:

El agente que ejecuta este skill debe verificar cada item antes de entregar el output al usuario.

Cross-References

metodologia-ai-software-architecture: Provides module structure context; patterns apply within modules
metodologia-ai-conops: Interaction level drives pattern selection (higher autonomy = more patterns needed)
metodologia-ai-pipeline-architecture: Patterns operate within pipeline stages; Feature Store is a pipeline component
metodologia-ai-testing-strategy: Tests validate pattern behavior (drift detection accuracy, rollback speed)
metodologia-genai-architecture: GenAI-specific patterns (RAG, agents) complement these general AI patterns
metodologia-aws-architecture-design: AWS-native pattern implementations (Bedrock Guardrails, Bedrock Agents, SageMaker A/B)
metodologia-infrastructure-architecture: Infrastructure supports pattern deployment (compute, storage, networking)
metodologia-devsecops-architecture: Security tactics complement availability tactics

Output Format Protocol

Format	Default	Description
`markdown`	Yes	Rich Markdown + Mermaid diagrams. Token-efficient.
`html`	On demand	Branded HTML (Design System). Visual impact.
`dual`	On demand	Both formats.

Output Artifact

Secondary: Pattern decision records (.md), anti-pattern detection checklist, pattern dependency graph (Mermaid/PNG/SVG), implementation roadmap.

Fuente: Avila, R.D. & Ahmad, I. (2025). Architecting AI Software Systems. Packt.