Designs AWS cloud architectures for AI and GenAI workloads applying the Well-Architected Framework GenAI Lens (6 pillars: GENOPS, GENSEC, GENREL, GENPERF, GENCOST, GENSUS), AWS service selection matrices, RAG/Agent/Fine-Tuning patterns, cost optimization strategies, and enterprise reference architectures. Activated when designing, evaluating, or migrating AI systems on AWS. [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill is limited to using the following tools:
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/aws-genai-patterns.mdreferences/aws-service-catalog.mdreferences/well-architected-genai-lens.mdtemplates/output.docx.mdtemplates/output.htmlDiseñar, evaluar y optimizar arquitecturas AWS para sistemas de inteligencia artificial y AI generativa, aplicando el Well-Architected Framework con GenAI Lens, patrones arquitectónicos probados, y matrices de selección de servicios que equilibren rendimiento, costo, seguridad y sostenibilidad. [EXPLICIT]
Well-Architected First — Toda decisión arquitectónica se evalúa contra los 6 pilares del GenAI Lens (Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability). No se aprueba un diseño que viole cualquier pilar sin un ADR explícito documentando el trade-off. [EXPLICIT]
Service-Native over Custom — Preferir servicios gestionados de AWS (Bedrock, SageMaker, OpenSearch Serverless, Bedrock Guardrails) sobre implementaciones custom. Solo justificar custom cuando los requisitos excedan las capacidades del servicio gestionado con evidencia medible. [EXPLICIT]
Cost-Aware by Design — El costo no es una optimización posterior; es una restricción de diseño desde la primera iteración. Model tiering (Haiku→Sonnet→Opus), batch inference (50% savings), semantic caching, y provisioned throughput son decisiones arquitectónicas, no operacionales. [EXPLICIT]
Parámetros:
MODO: [assessment | design | migration | optimization]
FORMATO: [ejecutivo | técnico | híbrido]
ALCANCE: [rag | agents | fine-tuning | multi-model | streaming | batch | full]
ESCALA: [startup | growth | enterprise]
Detección automática:
- Si existe infrastructure/ o cdk/ o cloudformation/ → MODO=optimization
- Si existe sagemaker/ o bedrock/ → ALCANCE detectado del código
- Si el input menciona "migrar" o "mover a AWS" → MODO=migration
- Default: MODO=design, ALCANCE=full, ESCALA=growth
ai-software-architectureai-pipeline-architectureai-design-patternsgenai-architecturecloud-native-architectureEvaluar el workload contra los 6 pilares del GenAI Lens. [EXPLICIT]
Load references:
Read ${CLAUDE_SKILL_DIR}/references/well-architected-genai-lens.md
Entregable: Scorecard por pilar con findings categorizados.
| Pilar | Código | Evaluación | Findings |
|---|---|---|---|
| Operational Excellence | GENOPS01-05 | Score 1-5 | Lista de gaps |
| Security | GENSEC01-06 | Score 1-5 | Lista de gaps |
| Reliability | GENREL01-06 | Score 1-5 | Lista de gaps |
| Performance Efficiency | GENPERF01-04 | Score 1-5 | Lista de gaps |
| Cost Optimization | GENCOST01-05 | Score 1-5 | Lista de gaps |
| Sustainability | GENSUS01-03 | Score 1-5 | Lista de gaps |
Para cada finding: clasificar como HIGH / MEDIUM / LOW risk, proponer remediation con servicio AWS específico. [EXPLICIT]
Métricas clave por pilar:
Seleccionar servicios AWS óptimos para cada componente del sistema AI. [EXPLICIT]
Load references:
Read ${CLAUDE_SKILL_DIR}/references/aws-service-catalog.md
Read ${CLAUDE_SKILL_DIR}/references/aws-genai-patterns.md
Entregable: Service mapping table + architecture diagram.
Proceso de selección por componente:
| Componente | Decisión | Criterios |
|---|---|---|
| Foundation Model Access | Bedrock vs SageMaker endpoint | Control level, model availability, pricing |
| Compute | Inferentia2 vs GPU vs Lambda | Latency requirements, volume, cost sensitivity |
| Vector Store | OpenSearch vs pgvector vs MemoryDB | Existing stack, query patterns, scale |
| Orchestration | Step Functions vs Bedrock Agents | Complexity, human-in-loop, custom logic |
| API Layer | API Gateway vs ALB+ECS | WebSocket needs, throttling, auth |
| Security | Bedrock Guardrails + WAF + IAM | Compliance requirements, PII sensitivity |
| Monitoring | CloudWatch + X-Ray + Model Monitor | Observability depth, custom metrics |
Para cada decisión: documentar alternativa descartada y razón con evidencia. [EXPLICIT]
Patrones arquitectónicos (seleccionar según ALCANCE):
Diseñar la arquitectura detallada de RAG y/o Agents usando servicios AWS nativos. [EXPLICIT]
Entregable: Component diagram + data flow + configuration specs.
RAG Architecture Decision Tree:
Agent Architecture Decision Tree:
Data Flow Template:
Ingestion: S3 → [Lambda trigger] → Bedrock Embeddings → Vector Store
Query: Client → API GW → Lambda → Vector Search → Bedrock Generation → Response
Agent: Client → API GW → Bedrock Agent → [Action Group Lambda] → [KB RAG] → Response
Diseñar la postura de seguridad para el workload GenAI en AWS. [EXPLICIT]
Entregable: Security architecture diagram + controls matrix.
Capas de seguridad (defense in depth):
| Capa | Servicio AWS | Control |
|---|---|---|
| Perimetral | WAF + CloudFront | Prompt injection filtering, rate limiting |
| API | API Gateway + Cognito | AuthN/AuthZ, API keys, usage plans |
| Network | PrivateLink + VPC Endpoints | No internet exposure for model endpoints |
| Data in transit | TLS 1.2+ | Encrypted API calls to Bedrock/SageMaker |
| Data at rest | KMS (CMK) | Model artifacts, training data, vector stores |
| Content safety | Bedrock Guardrails | PII redaction, topic denial, content filtering |
| PII detection | Amazon Macie | Scan training data, S3 buckets |
| Audit | CloudTrail + Security Hub | All model invocations logged, centralized findings |
| Agent scope | IAM + Action Groups | Least privilege per Lambda, tool whitelisting |
OWASP Top 10 for LLMs — AWS Mitigations:
Diseñar la estrategia de optimización de costos para workloads GenAI en AWS. [EXPLICIT]
Entregable: Cost model + optimization roadmap.
Cost Levers (mayor a menor impacto):
| Lever | Savings | Implementation |
|---|---|---|
| Model tiering | 60-90% | Haiku for simple, Sonnet for standard, Opus for complex |
| Batch inference | 50% | Bedrock batch for non-real-time workloads |
| Provisioned throughput | 30-50% | Committed capacity for predictable workloads |
| Semantic caching | 20-40% | ElastiCache similarity cache for repeated queries |
| Prompt optimization | 10-30% | Compression, fewer examples, shorter system prompts |
| AI silicon | 40-50% | Inferentia2 (inference), Trainium (training) vs GPU |
| Vector quantization | 50-75% storage | Reduce embedding dimensions, binary quantization |
Cost Tracking Architecture:
Bedrock/SageMaker → CloudWatch Metrics → Cost Explorer (per-model tags)
→ AWS Budgets (alerts per model/team/project)
→ Cost & Usage Report (per-invocation attribution)
FinOps Cadence:
Diseñar para alta disponibilidad, rendimiento consistente, y escalabilidad en AWS. [EXPLICIT]
Entregable: Reliability architecture + performance baselines + scaling plan.
Reliability Patterns:
| Pattern | AWS Implementation |
|---|---|
| Multi-AZ inference | SageMaker multi-AZ endpoints, OpenSearch multi-AZ |
| Multi-region | Bedrock cross-region inference profiles, S3 CRR |
| Fallback cascade | Primary model → Secondary → Cache → Graceful degradation |
| Circuit breaker | Lambda + DynamoDB state, API Gateway throttling |
| Request queuing | SQS for async inference, dead-letter for failures |
| Health monitoring | Route 53 health checks, CloudWatch alarms, auto-recovery |
Performance Baselines (establecer antes de producción):
| Metric | Target | Measurement |
|---|---|---|
| Inference latency P50 | < 500ms | CloudWatch Bedrock metrics |
| Inference latency P99 | < 2s | CloudWatch Bedrock metrics |
| RAG retrieval latency | < 200ms | X-Ray trace segments |
| Embedding generation | < 100ms | CloudWatch custom metric |
| Availability | 99.9%+ | Route 53 + CloudWatch composite alarm |
| Throughput | Per SLA | Provisioned throughput or auto-scaling |
Scaling Strategy:
| Dimension | Bedrock Managed | SageMaker Custom | Hybrid |
|---|---|---|---|
| Time-to-market | ★★★★★ | ★★☆☆☆ | ★★★☆☆ |
| Model flexibility | ★★★☆☆ | ★★★★★ | ★★★★★ |
| Operational burden | ★★★★★ (zero-ops) | ★★☆☆☆ | ★★★☆☆ |
| Cost control | ★★★☆☆ | ★★★★★ | ★★★★☆ |
| Customization | ★★☆☆☆ | ★★★★★ | ★★★★☆ |
| Security posture | ★★★★★ | ★★★★☆ | ★★★★☆ |
| Multi-region | ★★★★☆ | ★★★☆☆ | ★★★☆☆ |
Modelo no disponible en la región: Documentar cross-region inference profile como solución. Si compliance impide cross-region, evaluar SageMaker con modelo self-hosted como alternativa. [EXPLICIT]
Workload con picos extremos (100x baseline): Diseñar con SQS queue + async inference. Provisioned throughput para baseline, on-demand para picos, batch para backlog. [EXPLICIT]
Migración desde otro cloud (GCP/Azure): Mapear servicios equivalentes (Vertex AI→Bedrock, Azure OpenAI→Bedrock), identificar vendor lock-in points, diseñar abstraction layer si multi-cloud es requisito futuro. [EXPLICIT]
Regulación estricta (HIPAA, PCI-DSS, SOX): Activar BAA para Bedrock, VPC-only deployment, KMS CMK, CloudTrail con log file validation, Macie continuous scan. Documentar compliance matrix. [EXPLICIT]
Presupuesto extremadamente limitado: Priorizar Bedrock on-demand (sin upfront), Lambda para compute, Aurora Serverless v2 con pgvector (si ya existe), Bedrock batch inference para todo lo que tolere latencia. [EXPLICIT]
design (default).rag, si menciona agentes → agents). Si no hay señales, usar full.growth (default). Ajustar si el contexto sugiere startup (< 10 personas, MVP) o enterprise (multi-account, compliance estricto).Antes de entregar, verificar:
El agente que ejecuta este skill debe verificar cada item antes de entregar el output al usuario.
| Skill | Relación |
|---|---|
ai-software-architecture | Arquitectura interna del sistema AI (este skill diseña la infra AWS) |
genai-architecture | Patrones GenAI cloud-agnostic (este skill los implementa en AWS) |
ai-pipeline-architecture | Pipelines AI conceptuales (este skill los mapea a servicios AWS) |
ai-design-patterns | Patrones y tácticas AI (este skill los despliega en AWS) |
ai-testing-strategy | Estrategia de testing (este skill define la infra AWS para testing) |
ai-conops | CONOPS del sistema (este skill implementa los modos operacionales en AWS) |
cloud-migration | Migración cloud (este skill se especializa en AI workloads) |
finops | FinOps general (este skill se especializa en costos GenAI) |
security-architecture | Seguridad general (este skill añade GENSEC para AI) |
solutions-architecture | Diseño de solución (este skill se especializa en AI en AWS) |
if FORMATO == "ejecutivo":
Resumen 1 página + diagrama de arquitectura + cost summary + top 5 recommendations
Audiencia: C-Level, decisores de inversión
if FORMATO == "técnico":
Full 6-section delivery + service mapping tables + configuration specs
Audiencia: Arquitectos, DevOps, ML Engineers
if FORMATO == "híbrido":
Executive summary (1 página) + Technical deep-dive (S1-S6 completos)
Audiencia: Technical leads que reportan a C-Level
## {System Name} — AWS Architecture for AI/GenAI
### Executive Summary
[1 párrafo: objetivo, patrón seleccionado, servicios clave, costo estimado magnitud]
### Well-Architected Scorecard
[S1 scorecard table — 6 pilares con scores y top findings]
### Architecture Diagram
[ASCII o Mermaid — todas las capas del reference architecture]
### Service Selection
[S2 service mapping — cada componente con servicio, alternativa, justificación]
### {Pattern} Architecture Detail
[S3 — RAG/Agent/Fine-Tuning design específico]
### Security Controls
[S4 — defense in depth layers + OWASP LLM mitigations]
### Cost Model
[S5 — cost levers, FinOps cadence, optimization roadmap]
### Reliability & Performance
[S6 — reliability patterns, performance baselines, scaling strategy]
### Validation Checklist
[Checklist completado con evidencia]
### ADRs
[Decisiones arquitectónicas clave con contexto y trade-offs]
Fuente: AWS Well-Architected Framework — Generative AI Lens (2024). | Avila, R.D. & Ahmad, I. (2025). Architecting AI Software Systems. Packt.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.