AI software architecture design — modules, layers, boundaries, design patterns, ADRs, quality attributes, and technical debt strategy for AI-enabled systems. This skill should be used when the user asks to "design AI system structure", "define AI module boundaries", "select AI architecture patterns", "document AI architecture decisions", "evaluate AI code architecture", or mentions AI pipelines, feature stores, model serving, drift detection, ML quality attributes, explainability architecture, or AI technical debt.
From pmnpx claudepluginhub javimontano/mao-pm-apexThis skill is limited to using the following tools:
references/ai-architecture-stack.mdreferences/ai-patterns-catalog.mdreferences/ai-quality-attributes.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
AI software architecture defines how code is organized internally in systems that combine traditional software components with AI/ML capabilities — module boundaries spanning data pipelines, model serving, and feature stores; layer separation across the 6-layer AI stack; design patterns bridging traditional and AI-specific concerns; and the reasoning behind technical decisions. This skill produces comprehensive architecture documentation that enables teams to understand, maintain, and evolve AI-enabled systems.
En sistemas de IA, la arquitectura implica dos tipos de deuda: la del codigo y la del modelo. Ignorar cualquiera de las dos garantiza degradacion silenciosa. Por eso se documenta ANTES de implementar, se valida contra quality attributes medibles (incluyendo fairness, explainability, robustness), y cada decision vive en un ADR con alternativas y trade-offs.
The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts.
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 module view + S3 patterns + S5 ADRs) | tecnica (full 6 sections, default)Before generating architecture, detect the codebase context:
!find . -name "*.py" -o -name "*.ts" -o -name "*.java" -o -name "*.go" | head -30
Use detected languages, frameworks, and ML libraries (PyTorch, TensorFlow, scikit-learn, LangChain, Hugging Face) to tailor pattern recommendations and component naming.
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/ai-architecture-stack.md
Read ${CLAUDE_SKILL_DIR}/references/ai-quality-attributes.md
Read ${CLAUDE_SKILL_DIR}/references/ai-patterns-catalog.md
Maps the internal module structure through the 6-layer AI architecture stack: Hardware, Data, Model, Inference, Application, Monitoring & Control.
Includes:
Key decisions:
Decomposes selected modules into components — what they do, interfaces exposed, dependencies. Focus on the five core production pipeline components.
Components:
Key decisions:
Documents selected patterns with justification, detected anti-patterns, and alternatives. Bridges traditional and AI-specific pattern catalogs.
AI-specific patterns (Feature Store, Champion-Challenger, Shadow Deployment, Drift Detection, Explainability Wrapper, Canary Deployment, Bulkhead):
Traditional patterns adapted for AI (Service-Oriented, Balancer, Circuit, Throttle, N-Party Voting):
Anti-patterns detected (Training-serving skew, Pipeline jungle, Dead feature columns, Undeclared consumers):
Principle: Patterns serve quality attributes. A Feature Store without multiple consumers is overhead. Shadow Deployment without evaluation criteria is wasted compute.
ATAM-style scenarios extended with AI-specific quality attributes: Stimulus → Response → Measure
| Quality Attribute | Example Scenario |
|---|---|
| Performance | Inference request completes within 200ms (p95) under 1000 concurrent users |
| Accuracy | ML model accuracy >= .88 threshold, .94 objective |
| Fairness | Model fairness across demographic groups >= 90% parity (threshold), 95% (objective) |
| Explainability | Explainability score >= 0.7 (threshold), 0.8 (objective); top-5 features explain >60% |
| Robustness | Model accuracy change under perturbation <= +/-10% (threshold), +/-5% (objective) |
| Drift Resilience | Drift detected within <1 hour (threshold), <10 minutes (objective) |
| Availability | Service remains available during model failure with <30s failover to fallback |
| Modifiability | Swapping model algorithm requires changes to <=3 modules |
| Deployability | New model version deployed in <15 min with instant rollback |
| Compliance | All model decisions have complete audit trails; governance review workflows enforced |
Captures significant AI architecture decisions with context, decision, consequences, and alternatives.
AI-specific ADR topics:
ADR structure: Title, Status, Context (business + technical constraints), Decision (what + why), Consequences (positive/negative/neutral), Alternatives considered, Related decisions.
Scope: Decisions affecting multiple pipeline stages, requiring significant refactoring if changed, or trading off AI quality attributes.
Identifies current architectural debt — including AI-specific debt — and a strategy for evolution.
AI-specific debt types:
Evolution strategy:
| Decision | Enables | Constrains | When to Use |
|---|---|---|---|
| Monolithic Pipeline | Simple deployment, easy debugging | Tight coupling, hard to scale components independently | Early-stage, single-model systems |
| Microservice-per-Model | Independent scaling, tech diversity | Distributed complexity, network overhead | Multi-model, multi-team, high-scale |
| Feature Store | Consistency, reuse, drift monitoring | Infrastructure overhead, governance cost | Multiple models share features |
| Champion-Challenger | Data-driven updates, risk management | Doubled compute, statistical significance needed | Production model updates |
| Shadow Deployment | Real-world validation without risk | Doubled inference compute, no user signal | High-stakes, regulated predictions |
| Drift Detection | Proactive model updates, early warning | Monitoring infrastructure, threshold calibration | All production AI systems |
| Explainability Wrapper | Transparency, compliance, trust | Added latency, explanation fidelity trade-off | Regulated industries, user-facing |
| Bulkhead Isolation | Fault containment, independent scaling | Resource overhead per compartment | Multi-model serving, critical availability |
Greenfield AI System: No existing structure; risk of over-engineering for hypothetical scale. Start with monolithic pipeline, defer microservice decomposition. Use ADRs for reversible decisions. Prioritize monitoring from day one.
Legacy System Adding AI: Existing architecture not designed for AI workloads. Watch for: impedance mismatch between request-response web app and batch training pipelines, data access patterns that don't support feature engineering, deployment processes that can't handle model artifacts. Use strangler fig for gradual AI integration.
Multi-Model System: Multiple models serving different use cases from shared data. Risk of resource contention, dependency conflicts, cascade failures. Apply Bulkhead pattern. Feature Store becomes essential. Model registry is non-optional.
Real-Time AI System: Latency requirements constrain model complexity, feature computation, and explanation depth. May need edge inference, model compression, or prediction caching. Quality attribute trade-offs between accuracy and latency must be explicit in ADRs.
Regulated AI System (Finance, Healthcare): Compliance requirements (audit trails, explainability, fairness) are architectural constraints, not afterthoughts. Explainability Wrapper pattern is mandatory. Data lineage tracking at every pipeline stage. Model governance workflows built into CI/CD.
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | Yes | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter.
Primary: A-01_AI_Software_Architecture_Deep.html — Executive summary, 6-layer module view, component cards, hybrid design patterns, AI quality attribute scenarios, ADRs, debt inventory and evolution roadmap.
Secondary: ADR repository (.md files, version-controlled), AI architecture stack diagram (Mermaid/PNG/SVG), quality attribute scenario checklist, pattern selection decision tree.
Fuente: Avila, R.D. & Ahmad, I. (2025). Architecting AI Software Systems. Packt. | Bass, L., Clements, P., & Kazman, R. (2021). Software Architecture in Practice (4th ed.). Addison-Wesley.