Infrastructure Architecture
Design production infrastructure: compute topology, network architecture, storage strategy, high availability and disaster recovery, IAM governance, and cloud landing zones.
Guiding Principle
"Infrastructure is code, but good infrastructure architecture is about the decisions behind the code — every resource is a trade-off between cost, performance, and resilience."
Procedure
Step 1 — Compute Architecture
- Classify workloads by compute pattern: long-running, burst, batch, event-driven
- Select compute primitives per workload: VMs, containers, serverless, bare metal
- Design autoscaling strategy: metrics, thresholds, cool-down periods
- Plan capacity for steady state and peak load (with headroom percentage)
- Define instance families, sizes, and spot/reserved mix for cost optimization
Step 2 — Network Architecture
- Design VPC/VNET topology: CIDR planning, subnet strategy, availability zones
- Define connectivity: VPN, Direct Connect/ExpressRoute, peering, transit gateway
- Design DNS strategy: public/private zones, split-horizon, failover records
- Implement network security: NACLs, security groups, WAF, DDoS protection
- Plan for hybrid connectivity if required (on-prem to cloud)
Step 3 — Storage & Data Architecture
- Classify data by access pattern: hot, warm, cold, archive
- Select storage services per tier: block, object, file, database-specific
- Design backup strategy: frequency, retention, cross-region replication
- Define data lifecycle policies: automatic tiering, expiration, compliance holds
- Plan encryption: at-rest encryption keys, rotation, access policies
Step 4 — HA/DR & Landing Zone
- Define availability targets: RTO and RPO per workload tier
- Design multi-AZ and multi-region topology for critical workloads
- Plan failover mechanisms: active-active, active-passive, pilot light, warm standby
- Design landing zone structure: account/subscription hierarchy, OU policies, guardrails
- Implement infrastructure as code with drift detection and compliance scanning
Quality Criteria
- Network CIDR plan accommodates 3x current workload growth without re-addressing
- HA design validated with chaos engineering (AZ failure simulation)
- All infrastructure defined as code with no manual console changes
- DR runbook tested quarterly with documented RTO/RPO achievement
Anti-Patterns
- Manual infrastructure provisioning without IaC version control
- Single-AZ deployments for production workloads
- Over-sized instances running 24/7 for burst workloads
- Landing zones without guardrails or compliance automation