Help us improve
Share bugs, ideas, or general feedback.
From azure
Assesses and improves Azure Functions reliability posture: zone redundancy, ZRS storage, health probes, multi-region failover. Scans resources, presents checklist, drives remediation.
npx claudepluginhub anthropics/claude-plugins-official --plugin azureHow this skill is triggered โ by the user, by Claude, or both
Slash command
/azure:azure-reliabilityThe summary Claude sees in its skill listing โ used to decide when to auto-load this skill
| Property | Details |
references/configure-health-probes.mdreferences/configure-multi-region.mdreferences/configure-storage.mdreferences/configure-zone-redundancy.mdreferences/health-probe-checks.mdreferences/iac-patching-bicep.mdreferences/iac-patching-terraform.mdreferences/multi-region-checks.mdreferences/services/functions/reliability.mdreferences/storage-redundancy-checks.mdreferences/zone-redundancy-checks.mdReviews AWS IaC code for Well-Architected Reliability Pillar compliance, checking Multi-AZ deployments, Auto Scaling, backups, health checks, S3 replication, and DR configurations.
Prepares apps for Azure deployment by generating Bicep/Terraform infrastructure, azure.yaml configs, and Dockerfiles. Use for creating, modernizing, or deploying to App Service, Container Apps, functions.
Guides Azure solutions using Well-Architected Framework pillars: reliability, security, cost optimization, operational excellence, performance efficiency, with CLI best practices.
Share bugs, ideas, or general feedback.
| Property | Details |
|---|---|
| Best for | Reliability posture assessment, zone redundancy enablement, multi-region failover setup |
| Primary capabilities | Reliability assessment table, Zone Redundancy Configuration, Multi-Region IaC Generation |
| Supported services | Azure Functions (App Service and Container Apps planned for a future version) |
| MCP tools | Azure Resource Graph queries, Azure CLI commands |
Activate this skill when user wants to:
Scope note: This skill currently covers Azure Functions only. If the user asks about Azure App Service or Azure Container Apps reliability, acknowledge that support is planned but not yet available, and only proceed with the parts that apply to Functions resources in scope.
az loginaz extension add --name resource-graph| Tool | Purpose |
|---|---|
mcp_azure_mcp_extension_cli_generate | Generate az CLI commands for resource queries and configuration |
mcp_azure_mcp_subscription_list | List available subscriptions |
mcp_azure_mcp_group_list | List resource groups |
Primary query method: Azure Resource Graph via az graph query (requires az extension add --name resource-graph).
Important: Always scope queries to the user's specified resource group or subscription. Add these filters to every Resource Graph query:
| where resourceGroup =~ '<rg-name>'--subscriptions <sub-id> flag on az graph query| where name =~ '<app-name>'Two-step assessment: platform-level discovery first, then per-service deep dive.
Step 1 โ Platform discovery (find what's there). Use these to enumerate resources in scope and detect cross-cutting reliability gaps:
| Platform check | Reference |
|---|---|
| Zone redundancy โ discovery | references/zone-redundancy-checks.md |
| Storage redundancy (cross-service) | references/storage-redundancy-checks.md |
| Multi-region & global load balancers | references/multi-region-checks.md |
| Front Door / Traffic Manager / App Insights probes | references/health-probe-checks.md |
Step 2 โ Per-service deep dive. For each compute resource discovered in Step 1, load the matching service reference. The service reference is the single source of truth for that service's plan/SKU rules, assessment queries, CLI commands, IaC patches (Bicep + Terraform + AVM), and reporting hints.
This skill version ships only the Azure Functions per-service reference. Other compute services are listed below explicitly so the dispatch logic is unambiguous: if a resource matches an unsupported row, do not attempt to load a reference, fabricate CLI commands, or generate IaC patches for it.
| Service detected | Reference |
|---|---|
Azure Functions (microsoft.web/serverfarms with kind contains 'functionapp') | references/services/functions/reliability.md |
Azure App Service (non-Functions sites: microsoft.web/sites without kind contains 'functionapp', microsoft.web/serverfarms without kind contains 'functionapp') | โช Not yet shipped โ planned for a future version |
Azure Container Apps (microsoft.app/containerapps, microsoft.app/managedenvironments) | โช Not yet shipped โ planned for a future version |
Handling unsupported services: If a resource matches an unsupported row above, surface it in the discovery summary, mark it as
โช not assessed (planned)in the Phase 3 table, and skip the per-service remediation steps for it. Do not attempt to fabricate CLI commands or IaC patches for those services.
Present findings as a feature-pivoted table: one row per reliability feature (Zone redundancy on compute, Zone-redundant storage, Health probes, Multi-region failover), with a single status indicator and the specific resources that are relevant to that feature. This avoids the noise of one-row-per-resource with mostly n/a cells. Do not assign numeric scores or grades.
๐ Reliability Assessment โ {scope}
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Reliability Feature Status Resources
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Zone redundancy โ compute ๐ด OFF โข plan-ii5trxva2ark4 (FC1)
Zone-redundant storage ๐ด GRS โข stii5trxva2ark4 (defaulted; no SKU set in IaC)
Health probes ๐ด OFF โข func-api-ii5trxva2ark4 โ needs code change (FC1)
Multi-region failover ๐ด OFF โข Single region (eastus) only โ Front Door not configured
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Want me to fix the ๐ด items? I'll do the quick wins first (Function App
plan zone redundancy + health checks on supported plans), then ask before
storage migration and multi-region setup. (yes/no)
Rules for the table:
๐ข ON โ feature is fully enabled across all relevant resources in scope๐ก PARTIAL โ some resources have it, some don't (or partial config like liveness-only)๐ด OFF โ feature is missing on all relevant resourcesOFF with the current SKU when relevant (๐ด LRS, ๐ด GRS, ๐ข ZRS, ๐ข GZRS). When no SKU is set in IaC, label as ๐ด GRS (ARM/AVM default) and note that in the resource line.(FC1), (defaulted; no SKU set), liveness only, needs code change (FC1)).โ already ON so the user sees credit for what's right.n/a, โ, or empty cells. If a feature doesn't apply to any resource in scope, drop the row.UX Note: If the assessment finds the app already has all core reliability features (zone redundancy, ZRS/GZRS storage, health probes), skip the fix-it question and jump straight to Configuration Workflow Step 3 (Multi-region follow-up). Do NOT start any multi-region work without explicit consent.
When user wants to fix findings from the assessment:
โ ALWAYS confirm with user before executing changes. Show what will change, any cost implications, and any destructive actions (e.g., environment recreation).
After assessment, if user says "fix it" / "improve my reliability" / "enable zone redundancy":
I'll start with the quick wins (no downtime, fast):
1. โ๏ธ Enable zone redundancy on plan-ii5trxva2ark4 (Flex Consumption โ no cost change)
2. โ๏ธ Set health check path to /api/health on func-api-ii5trxva2ark4
Then, separately, I'll ask if you want to upgrade storage:
3. ๐ Upgrade stii5trxva2ark4 from LRS โ ZRS (small cost increase, migration takes hours)
โ Required for full zone redundancy, but I'll confirm with you before starting.
How would you like to apply these changes?
A) Fix now โ Run az CLI commands against your live resources (immediate, one-time)
B) Patch my IaC โ Update your Bicep/Terraform files so changes persist across deploys
(If you use azd or Terraform, option B is recommended so `azd up` won't overwrite changes.)
Run fixes against live resources using az CLI commands. Quick wins first, then ask before the slow storage migration.
The exact CLI commands per service live in the per-service references โ pick the one(s) matching the resources discovered in Phase 2:
| Fix | Reference |
|---|---|
| Enable zone redundancy / configure health probes (Functions) | references/services/functions/reliability.md |
| Upgrade storage replication (cross-service) | references/configure-storage.md |
| Set up multi-region (cross-service) | references/configure-multi-region.md |
| Platform overview / verification | references/configure-zone-redundancy.md, references/configure-health-probes.md |
Execution order โ always quick wins first:
Zone redundancy on compute (fast, in-place property update on the Function App's plan).
Health probes (Premium / Dedicated only โ in-place; for FC1 / Consumption, follow the consent gate in configure-health-probes.md).
Verify the compute changes succeeded before doing anything else.
โ STOP โ Ask about storage upgrade. Compute is now zone-redundant, but storage may still be LRS or GRS. Ask the user explicitly:
โ
Compute is now zone-redundant.
To be **fully zone-redundant**, your storage account also needs to be upgraded:
โข stii5trxva2ark4: currently `Standard_LRS` โ needs `Standard_ZRS`
โ ๏ธ This is a live storage redundancy conversion:
โข Takes hours to days depending on data volume
โข Small ongoing cost increase (~$0.01/GB/month more)
โข Only supported for Standard general-purpose v2 accounts
Do you want me to start the storage migration now? (yes / no / later)
az storage account update --sku Standard_ZRS (or migration start if needed); poll az storage account show --query sku.name until it reports Standard_ZRS.Multi-region โ do NOT auto-run. Handled in Step 3 below as an explicit follow-up after re-assessment.
โ ๏ธ Warning: If the user uses
azd uporterraform applylater, CLI-only changes may be overwritten by the IaC definitions. Recommend also patching IaC after CLI fixes.
Update the user's Bicep or Terraform files so reliability settings are persistent.
Step 1: Detect IaC type
infra/ folder in project root*.bicep or *.tf files*.bicep files โ use Bicep patching*.tf files โ use Terraform patchingStep 2: Classify each fix by risk level
| Fix | Risk Level | What Happens |
|---|---|---|
| Zone redundancy (Function App plan) | ๐ข Safe patch | In-place property update on next deploy |
| Storage LRS โ ZRS | ๐ก Pre-migration required | Live storage migration must complete before the IaC SKU change can deploy. Never bundle with safe patches โ use the two-deploy flow in Steps 3โ5. |
| Health check path (Premium / Dedicated) | ๐ข Safe patch | In-place update, but causes app restart |
| Health check path (FC1 / Consumption) | โช Code-only โ ask first | healthCheckPath is unsupported. Adding a health endpoint requires adding an HTTP-triggered /api/health function to app code. Always ask the user for explicit consent before touching source code. Do not patch IaC. |
Step 3: Apply patches in two deploys (quick wins first)
The IaC patching framework (detection, AVM-module guidance, deploy-order rule, storage SKU patch) lives in:
| IaC Type | Framework reference |
|---|---|
| Bicep | references/iac-patching-bicep.md |
| Terraform | references/iac-patching-terraform.md |
The actual per-service compute patches (Function App plan ZR, etc.) live in the per-service references โ load the matching service file from Phase 2 for the exact Bicep / Terraform / AVM snippets. Only Azure Functions has a per-service reference in this skill version; non-Functions compute (App Service / Container Apps) is out of scope.
Deploy 1 โ Quick wins only. Patch the ๐ข Safe items (zone redundancy on the Function App plan, health probes on Premium / Dedicated). Do NOT include the storage SKU patch in this deploy.
After patching, the skill runs the deploy itself (do not stop and tell the user to run it). Detect the deployment tool and confirm once before executing:
๐ฆ Patches applied to your IaC. Ready to deploy:
Tool detected: azd (found azure.yaml)
Command: azd up
Proceed with deployment? (yes / no)
On yes, run the appropriate command, stream output back to the user, and continue to the next step on success:
azure.yaml): azd upaz deployment group create --resource-group <rg> --template-file infra/main.bicep --parameters @infra/main.parameters.jsonterraform plan -out tfplan โ (show plan summary) โ terraform apply tfplanOn no, stop and report the patched files; do not proceed to Step 4 / Re-Assess.
If deployment fails, surface the error and stop โ do not continue to the storage step.
โ STOP โ Ask about storage upgrade before Deploy 2. After Deploy 1 succeeds, ask the user explicitly:
โ
Quick-win patches deployed. Compute is now zone-redundant.
To be **fully zone-redundant**, your storage account also needs to be upgraded:
โข stii5trxva2ark4: currently `Standard_LRS` โ needs `Standard_ZRS`
โ ๏ธ This is a two-part change:
1. Live storage migration (`az storage account migration start`) โ takes hours to days
2. A second deploy to update your IaC's storage SKU to match
Do you want me to start the storage migration now? (yes / no / later)
Step 4: Storage migration (only if user said yes in Step 3)
The skill runs these commands itself โ do not ask the user to run them. Show progress as you go:
๐ Starting storage migration (this can take up to 72 hours)...
az storage account migration start --name stii5trxva2ark4 \
--resource-group rg-example --sku Standard_ZRS --no-wait
Polling: az storage account show --name stii5trxva2ark4 --query sku.name
...
โ
Migration complete: sku.name = Standard_ZRS
For very long migrations, you may surface a checkpoint to the user ("this is still running, check back later") rather than blocking the entire conversation.
Step 5: Deploy 2 โ storage SKU patch
After the migration completes, the skill patches the storage SKU in IaC and runs the same deploy command as Step 3 (e.g. azd up). This deploy is a no-op confirmation that the IaC matches the live state. Confirm once with the user before executing, then run it directly.
After changes are applied (CLI) or deployed (IaC), automatically re-run the assessment and show the same feature-pivoted table as Phase 3, with each feature row's status updated to reflect the new state. Briefly call out what changed since the previous run.
๐ Reliability Re-Assessment โ rg-eventhubs-python-jan13 (eastus)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Reliability Feature Status Resources
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Zone redundancy โ compute ๐ข ON โข plan-ii5trxva2ark4 (FC1) โ now ON
Zone-redundant storage ๐ข ZRS โข stii5trxva2ark4 โ GRS โ ZRS
Health probes ๐ด OFF โข func-api-ii5trxva2ark4 โ still off (FC1, code change declined)
Multi-region failover ๐ด OFF โข Single region (eastus) only
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
What changed: Function App plan zone redundancy and storage replication.
(Multi-region offered next โ see Step 3.)
Multi-region is a significant cost/complexity step. Do NOT start it automatically. After re-assessment, only if all core single-region reliability features are ๐ข ON (zone-redundant compute, ZRS/GZRS storage, health probes), explicitly ask the user and wait for their response before doing anything:
๐ข Your app is now fully zone-redundant in {region}.
The next step (optional) is multi-region failover with Azure Front Door:
โข Deploys compute + storage in a second region (paired region recommended)
โข Adds Azure Front Door for global load balancing with health-probe-driven failover
โข Protects against full region outages
โข Estimated additional cost: ~2x compute (active-passive); Front Door ~$35/month base
Do you want me to set up multi-region failover now? (yes / no / later)
๐ฆ Multi-region IaC generated. Ready to deploy with \azd up`. Proceed? (yes / no)`azd up / az deployment group create / terraform apply) and streams output. Do not stop and tell the user to run it.โ Do not skip the wait. Do not generate multi-region IaC, deploy a Front Door, or modify any files until the user has explicitly said yes. If core reliability is not yet all ๐ข, do not ask about multi-region โ finish the core gaps first.
| Priority | Criteria | Action |
|---|---|---|
| Critical | No zone redundancy AND production workload | Fix immediately |
| High | LRS storage on zone-redundant compute | Fix within days |
| Medium | No multi-region (single region but zone-redundant) | Plan for next sprint |
| Low | Missing health probes or monitoring gaps | Track and fix |
| Error | Message | Remediation |
|---|---|---|
| Authentication required | "Please login" | Run az login and retry |
| Access denied | "Forbidden" | Confirm Reader/Contributor role assignment |
| Plan doesn't support ZR | "Upgrade required" | Inform user of plan upgrade path + cost delta |
| Region doesn't support AZ | "Region limitation" | Suggest supported regions |
| Action | This skill does | Hand off to |
|---|---|---|
| Assess reliability posture | โ Yes | โ |
| Recommend improvements | โ Yes | โ |
| Enable zone redundancy (CLI commands) | โ Yes | โ |
| Patch Bicep/Terraform for reliability | โ Yes | โ |
| Generate multi-region IaC | โ Yes (additions for the secondary region + Front Door) | azure-prepare for full new-app IaC scaffolding |
| Deploy IaC for reliability changes | โ
Yes (runs azd up / terraform apply / az deployment itself, after user confirmation) | azure-deploy for general/non-reliability deploys |
| Validate pre-deployment | Reliability checks only | azure-validate for full validation |