This skill should be used when the user asks to "design cloud infrastructure", "plan network topology", "define HA/DR strategy", "set up cloud landing zones", or "optimize cloud costs". [EXPLICIT] Also triggers on mentions of VPC, Kubernetes, serverless, multi-AZ, IAM, reserved instances, chaos testing, or any compute/network/storage platform design. Use this skill even if the user only mentions a single infrastructure concern — the full platform context is always relevant. [EXPLICIT]
From jm-adknpx claudepluginhub javimontano/jm-adk-alfaThis skill is limited to using the following tools:
agents/guardian.mdagents/lead.mdagents/specialist.mdagents/support.mdevals/evals.jsonknowledge/body-of-knowledge.mdknowledge/knowledge-graph.mdprompts/meta.mdprompts/primary.mdprompts/variations/deep.mdprompts/variations/quick.mdreferences/infra-arch-patterns.mdtemplates/output.docx.mdtemplates/output.htmlInfrastructure architecture designs where and how software runs — compute resources, network topology, data storage, high availability, disaster recovery, identity management, and cost optimization. It answers: "How do we provide a platform for applications?"
La infraestructura invisible es la mejor infraestructura. La plataforma existe para que las aplicaciones corran — no para ser admirada. Se diseña para reliability, cost-efficiency, y self-service. Si los desarrolladores necesitan pedir tickets para desplegar, la infra falló en su misión.
The user provides a system or platform name as $ARGUMENTS. Parse $1 as the platform/system name used throughout all output artifacts. [EXPLICIT]
Parameters:
{MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
{FORMATO}: markdown (default) | html | dual{VARIANTE}: ejecutiva (~40% — S1 network topology + S4 HA/DR + S7 cost optimization) | técnica (full 7 sections, default)Before generating architecture, detect infrastructure context:
!find . -name "*.tf" -o -name "*.yaml" -o -name "Dockerfile" -o -name "*.hcl" | head -20
If reference materials exist, load them:
Read ${CLAUDE_SKILL_DIR}/references/cloud-patterns.md
Read ${CLAUDE_SKILL_DIR}/references/cost-models.md
Design of network architecture ensuring connectivity, segmentation, security, and resilience. [EXPLICIT]
VPC/Network Architecture: Subnets by tier:
Connectivity: Intra-region, inter-region, VPN, Direct Connect/dedicated circuits
Firewalls & Security Groups: Network ACLs (stateless), Security Groups (stateful), least privilege
Load Balancing: L4 (NLB), L7 (ALB), geographic (Route 53, CloudFront)
DNS & CDN: Public DNS, private DNS (service discovery), CDN (cache globally)
DDoS & WAF: Shield/Cloudflare for DDoS, WAF for application-layer attacks
Strategy for running workloads — VMs, containers, or serverless. [EXPLICIT]
VMs: Full control, best for legacy/compliance. Trade-off: more management overhead. Containers (Docker/K8s): Standardized, portable. Best for microservices. Trade-off: orchestration complexity. Serverless: No infra management, pay per invocation. Best for event-driven. Trade-off: cold start, vendor lock-in, cost at scale.
Kubernetes Architecture (if containers):
Auto-Scaling: Horizontal (stateless), vertical (stateful); metrics: CPU, memory, custom, queue depth Resource Limits: Requests (guaranteed), limits (max); balanced for predictability vs. flexibility
Data persistence — performance, reliability, cost. [EXPLICIT]
Block Storage: Virtual hard drives for IOPS-intensive workloads (databases) Object Storage: Distributed, durable, cheap at scale (backups, logs, media, data lake) File Storage: Shared filesystem (NFS) for multi-instance access
Database Hosting: Managed (RDS/Cloud SQL: less ops, more cost) vs. self-managed (full control, more ops)
Backup & DR:
Data Tiering: Hot (SSD), warm (standard), cold (archive/Glacier); lifecycle policies for automatic transitions
Strategy for surviving failures and maintaining continuity. [EXPLICIT]
Failure Modes:
Multi-Region: Active-passive (lower cost, longer RTO) vs. active-active (higher cost, low RTO, eventual consistency)
Failover Mechanisms: DNS-based, load balancer, database replica promotion; automatic vs. manual
Chaos Testing: Regularly kill instances, fail services, simulate zone failures. Tools: Gremlin, LitmusChaos, Chaos Monkey. Goal: validate assumptions before production incidents.
Identity and access management for infrastructure resources. [EXPLICIT]
Foundation for safe, scalable, compliant cloud deployment. [EXPLICIT]
Account Structure: Management (billing/guardrails), shared services (logging/monitoring/security), workload accounts (dev/staging/prod per app or team)
Guardrails: Preventive (SCPs: no public S3 buckets) + Detective (Config/Security Hub: monitor violations)
Tagging Strategy: Owner, environment, cost center, application, compliance. Enables: cost allocation, resource discovery, compliance audits.
Billing & Cost Allocation: Tag-based allocation, budgets & alerts, reserved instances, savings plans
Network: Hub-and-spoke (centralized shared VPC), Transit Gateway, central DNS
Strategies for reducing cloud spend without sacrificing performance or reliability. [EXPLICIT]
| Decision | Enables | Constrains | When to Use |
|---|---|---|---|
| Multi-AZ | Survive zone failure | ~2x cost, complexity | Critical workloads, availability SLA |
| Multi-Region | Survive region failure, global low latency | Very high cost, eventual consistency | Global app, strict RPO/RTO |
| RDS Managed DB | Less ops overhead | Higher cost, less control | Most workloads, HA required |
| Self-Managed DB | Control, potentially lower cost | High ops burden, backup responsibility | Specialized needs, sufficient ops team |
| Kubernetes | Flexibility, standard, portable | Ops complexity | Polyglot, stateless, K8s-experienced teams |
| Serverless | No infra management | Cold start, vendor lock-in, cost at scale | Event-driven, unpredictable load |
| Reserved Instances | 30-70% discount | Inflexibility, upfront cost | Predictable, steady-state workloads |
| Spot Instances | 70-90% discount | Interruption risk | Fault-tolerant batch, non-critical |
On-Premises to Cloud Migration: Existing workloads must move with minimal disruption. Hybrid period: on-prem and cloud coexist. Approach: strangler fig, VPN connectivity, staged migration. [EXPLICIT]
Multi-Cloud (AWS + Azure + GCP): No unified API; complexity increases significantly. Solution: abstraction layer (Kubernetes), consistent tagging, multi-cloud governance. [EXPLICIT]
Highly Regulated (Financial, Healthcare): Data residency: data cannot leave country/region. Dedicated accounts, encryption, audit trails, periodic assessment. [EXPLICIT]
Extreme Scale (Millions of Users): Handle 10x-100x load without degradation. Cost critical from start. Global infrastructure, caching at every level, spot for batch. [EXPLICIT]
Cost-Constrained Startup: Limited budget, unpredictable growth. Serverless where possible, auto-scaling, spot instances, avoid reserved instances initially. [EXPLICIT]
Before finalizing delivery, verify:
| Format | Default | Description |
|---|---|---|
markdown | Yes | Rich Markdown + Mermaid diagrams. Token-efficient. |
html | On demand | Branded HTML (Design System). Visual impact. |
dual | On demand | Both formats. |
Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter. [EXPLICIT]
Primary: A-04_Infrastructure_Architecture_Deep.html — Executive summary, network topology, compute strategy, storage/database architecture, HA/DR strategy, IAM/security, cloud landing zone, cost optimization.
Secondary: Network diagram (VPC topology), auto-scaling policy, backup/recovery runbook, security compliance checklist, cost optimization quick wins.
Author: Javier Montaño | Last updated: 2026-03-12
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.