Designs AWS cloud architectures for AI and GenAI workloads applying the Well-Architected Framework GenAI Lens (6 pillars: GENOPS, GENSEC, GENREL, GENPERF, GENCOST, GENSUS), AWS service selection matrices, RAG/Agent/Fine-Tuning patterns, cost optimization strategies, and enterprise reference architectures. Activated when designing, evaluating, or migrating AI systems on AWS.
From pmnpx claudepluginhub javimontano/mao-pm-apexThis skill is limited to using the following tools:
references/aws-genai-patterns.mdreferences/aws-service-catalog.mdreferences/well-architected-genai-lens.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Diseñar, evaluar y optimizar arquitecturas AWS para sistemas de inteligencia artificial y AI generativa, aplicando el Well-Architected Framework con GenAI Lens, patrones arquitectónicos probados, y matrices de selección de servicios que equilibren rendimiento, costo, seguridad y sostenibilidad.
Well-Architected First — Toda decisión arquitectónica se evalúa contra los 6 pilares del GenAI Lens (Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability). No se aprueba un diseño que viole cualquier pilar sin un ADR explícito documentando el trade-off.
Service-Native over Custom — Preferir servicios gestionados de AWS (Bedrock, SageMaker, OpenSearch Serverless, Bedrock Guardrails) sobre implementaciones custom. Solo justificar custom cuando los requisitos excedan las capacidades del servicio gestionado con evidencia medible.
Cost-Aware by Design — El costo no es una optimización posterior; es una restricción de diseño desde la primera iteración. Model tiering (Haiku→Sonnet→Opus), batch inference (50% savings), semantic caching, y provisioned throughput son decisiones arquitectónicas, no operacionales.
Parámetros:
MODO: [assessment | design | migration | optimization]
FORMATO: [ejecutivo | técnico | híbrido]
VARIANTE: [rag | agents | fine-tuning | multi-model | streaming | batch | full]
ESCALA: [startup | growth | enterprise]
Detección automática:
- Si existe infrastructure/ o cdk/ o cloudformation/ → MODO=optimization
- Si existe sagemaker/ o bedrock/ → VARIANTE detectada del código
- Si el input menciona "migrar" o "mover a AWS" → MODO=migration
- Default: MODO=design, VARIANTE=full, ESCALA=growth
ai-software-architectureai-pipeline-architectureai-design-patternsgenai-architecturecloud-native-architectureEvaluar el workload contra los 6 pilares del GenAI Lens.
Load references:
Read ${CLAUDE_SKILL_DIR}/references/well-architected-genai-lens.md
Entregable: Scorecard por pilar con findings categorizados.
| Pilar | Código | Evaluación | Findings |
|---|---|---|---|
| Operational Excellence | GENOPS01-05 | Score 1-5 | Lista de gaps |
| Security | GENSEC01-06 | Score 1-5 | Lista de gaps |
| Reliability | GENREL01-06 | Score 1-5 | Lista de gaps |
| Performance Efficiency | GENPERF01-04 | Score 1-5 | Lista de gaps |
| Cost Optimization | GENCOST01-05 | Score 1-5 | Lista de gaps |
| Sustainability | GENSUS01-03 | Score 1-5 | Lista de gaps |
Para cada finding: clasificar como HIGH / MEDIUM / LOW risk, proponer remediation con servicio AWS específico.
Métricas clave por pilar:
Seleccionar servicios AWS óptimos para cada componente del sistema AI.
Load references:
Read ${CLAUDE_SKILL_DIR}/references/aws-service-catalog.md
Read ${CLAUDE_SKILL_DIR}/references/aws-genai-patterns.md
Entregable: Service mapping table + architecture diagram.
Proceso de selección por componente:
| Componente | Decisión | Criterios |
|---|---|---|
| Foundation Model Access | Bedrock vs SageMaker endpoint | Control level, model availability, pricing |
| Compute | Inferentia2 vs GPU vs Lambda | Latency requirements, volume, cost sensitivity |
| Vector Store | OpenSearch vs pgvector vs MemoryDB | Existing stack, query patterns, scale |
| Orchestration | Step Functions vs Bedrock Agents | Complexity, human-in-loop, custom logic |
| API Layer | API Gateway vs ALB+ECS | WebSocket needs, throttling, auth |
| Security | Bedrock Guardrails + WAF + IAM | Compliance requirements, PII sensitivity |
| Monitoring | CloudWatch + X-Ray + Model Monitor | Observability depth, custom metrics |
Para cada decisión: documentar alternativa descartada y razón con evidencia.
Patrones arquitectónicos (seleccionar según VARIANTE):
Diseñar la arquitectura detallada de RAG y/o Agents usando servicios AWS nativos.
Entregable: Component diagram + data flow + configuration specs.
RAG Architecture Decision Tree:
Agent Architecture Decision Tree:
Data Flow Template:
Ingestion: S3 → [Lambda trigger] → Bedrock Embeddings → Vector Store
Query: Client → API GW → Lambda → Vector Search → Bedrock Generation → Response
Agent: Client → API GW → Bedrock Agent → [Action Group Lambda] → [KB RAG] → Response
Diseñar la postura de seguridad para el workload GenAI en AWS.
Entregable: Security architecture diagram + controls matrix.
Capas de seguridad (defense in depth):
| Capa | Servicio AWS | Control |
|---|---|---|
| Perimetral | WAF + CloudFront | Prompt injection filtering, rate limiting |
| API | API Gateway + Cognito | AuthN/AuthZ, API keys, usage plans |
| Network | PrivateLink + VPC Endpoints | No internet exposure for model endpoints |
| Data in transit | TLS 1.2+ | Encrypted API calls to Bedrock/SageMaker |
| Data at rest | KMS (CMK) | Model artifacts, training data, vector stores |
| Content safety | Bedrock Guardrails | PII redaction, topic denial, content filtering |
| PII detection | Amazon Macie | Scan training data, S3 buckets |
| Audit | CloudTrail + Security Hub | All model invocations logged, centralized findings |
| Agent scope | IAM + Action Groups | Least privilege per Lambda, tool whitelisting |
OWASP Top 10 for LLMs — AWS Mitigations:
Diseñar la estrategia de optimización de costos para workloads GenAI en AWS.
Entregable: Cost model + optimization roadmap.
Cost Levers (mayor a menor impacto):
| Lever | Savings | Implementation |
|---|---|---|
| Model tiering | 60-90% | Haiku for simple, Sonnet for standard, Opus for complex |
| Batch inference | 50% | Bedrock batch for non-real-time workloads |
| Provisioned throughput | 30-50% | Committed capacity for predictable workloads |
| Semantic caching | 20-40% | ElastiCache similarity cache for repeated queries |
| Prompt optimization | 10-30% | Compression, fewer examples, shorter system prompts |
| AI silicon | 40-50% | Inferentia2 (inference), Trainium (training) vs GPU |
| Vector quantization | 50-75% storage | Reduce embedding dimensions, binary quantization |
Cost Tracking Architecture:
Bedrock/SageMaker → CloudWatch Metrics → Cost Explorer (per-model tags)
→ AWS Budgets (alerts per model/team/project)
→ Cost & Usage Report (per-invocation attribution)
FinOps Cadence:
Diseñar para alta disponibilidad, rendimiento consistente, y escalabilidad en AWS.
Entregable: Reliability architecture + performance baselines + scaling plan.
Reliability Patterns:
| Pattern | AWS Implementation |
|---|---|
| Multi-AZ inference | SageMaker multi-AZ endpoints, OpenSearch multi-AZ |
| Multi-region | Bedrock cross-region inference profiles, S3 CRR |
| Fallback cascade | Primary model → Secondary → Cache → Graceful degradation |
| Circuit breaker | Lambda + DynamoDB state, API Gateway throttling |
| Request queuing | SQS for async inference, dead-letter for failures |
| Health monitoring | Route 53 health checks, CloudWatch alarms, auto-recovery |
Performance Baselines (establecer antes de producción):
| Metric | Target | Measurement |
|---|---|---|
| Inference latency P50 | < 500ms | CloudWatch Bedrock metrics |
| Inference latency P99 | < 2s | CloudWatch Bedrock metrics |
| RAG retrieval latency | < 200ms | X-Ray trace segments |
| Embedding generation | < 100ms | CloudWatch custom metric |
| Availability | 99.9%+ | Route 53 + CloudWatch composite alarm |
| Throughput | Per SLA | Provisioned throughput or auto-scaling |
Scaling Strategy:
| Dimension | Bedrock Managed | SageMaker Custom | Hybrid |
|---|---|---|---|
| Time-to-market | ★★★★★ | ★★☆☆☆ | ★★★☆☆ |
| Model flexibility | ★★★☆☆ | ★★★★★ | ★★★★★ |
| Operational burden | ★★★★★ (zero-ops) | ★★☆☆☆ | ★★★☆☆ |
| Cost control | ★★★☆☆ | ★★★★★ | ★★★★☆ |
| Customization | ★★☆☆☆ | ★★★★★ | ★★★★☆ |
| Security posture | ★★★★★ | ★★★★☆ | ★★★★☆ |
| Multi-region | ★★★★☆ | ★★★☆☆ | ★★★☆☆ |
Modelo no disponible en la región: Documentar cross-region inference profile como solución. Si compliance impide cross-region, evaluar SageMaker con modelo self-hosted como alternativa.
Workload con picos extremos (100x baseline): Diseñar con SQS queue + async inference. Provisioned throughput para baseline, on-demand para picos, batch para backlog.
Migración desde otro cloud (GCP/Azure): Mapear servicios equivalentes (Vertex AI→Bedrock, Azure OpenAI→Bedrock), identificar vendor lock-in points, diseñar abstraction layer si multi-cloud es requisito futuro.
Regulación estricta (HIPAA, PCI-DSS, SOX): Activar BAA para Bedrock, VPC-only deployment, KMS CMK, CloudTrail con log file validation, Macie continuous scan. Documentar compliance matrix.
Presupuesto extremadamente limitado: Priorizar Bedrock on-demand (sin upfront), Lambda para compute, Aurora Serverless v2 con pgvector (si ya existe), Bedrock batch inference para todo lo que tolere latencia.
Antes de entregar, verificar:
| Skill | Relación |
|---|---|
ai-software-architecture | Arquitectura interna del sistema AI (este skill diseña la infra AWS) |
genai-architecture | Patrones GenAI cloud-agnostic (este skill los implementa en AWS) |
ai-pipeline-architecture | Pipelines AI conceptuales (este skill los mapea a servicios AWS) |
ai-design-patterns | Patrones y tácticas AI (este skill los despliega en AWS) |
ai-testing-strategy | Estrategia de testing (este skill define la infra AWS para testing) |
ai-conops | CONOPS del sistema (este skill implementa los modos operacionales en AWS) |
cloud-migration | Migración cloud (este skill se especializa en AI workloads) |
finops | FinOps general (este skill se especializa en costos GenAI) |
security-architecture | Seguridad general (este skill añade GENSEC para AI) |
solutions-architecture | Diseño de solución (este skill se especializa en AI en AWS) |
if FORMATO == "ejecutivo":
Resumen 1 página + diagrama de arquitectura + cost summary + top 5 recommendations
Audiencia: C-Level, decisores de inversión
if FORMATO == "técnico":
Full 6-section delivery + service mapping tables + configuration specs
Audiencia: Arquitectos, DevOps, ML Engineers
if FORMATO == "híbrido":
Executive summary (1 página) + Technical deep-dive (S1-S6 completos)
Audiencia: Technical leads que reportan a C-Level
## {System Name} — AWS Architecture for AI/GenAI
### Executive Summary
[1 párrafo: objetivo, patrón seleccionado, servicios clave, costo estimado magnitud]
### Well-Architected Scorecard
[S1 scorecard table — 6 pilares con scores y top findings]
### Architecture Diagram
[ASCII o Mermaid — todas las capas del reference architecture]
### Service Selection
[S2 service mapping — cada componente con servicio, alternativa, justificación]
### {Pattern} Architecture Detail
[S3 — RAG/Agent/Fine-Tuning design específico]
### Security Controls
[S4 — defense in depth layers + OWASP LLM mitigations]
### Cost Model
[S5 — cost levers, FinOps cadence, optimization roadmap]
### Reliability & Performance
[S6 — reliability patterns, performance baselines, scaling strategy]
### Validation Checklist
[Checklist completado con evidencia]
### ADRs
[Decisiones arquitectónicas clave con contexto y trade-offs]