FinOps and cloud cost engineering — visibility and attribution (tagging taxonomy), Infracost in CI/CD, OpenCost for Kubernetes, rightsizing recommendations, storage/network cost optimization, anomaly detection, and FinOps maturity model.
From clarcnpx claudepluginhub marvinrichter/clarc --plugin clarcThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
FinOps is the practice of bringing financial accountability to cloud spending. Engineering, Finance, and Business collaborate to understand and optimize cloud costs without sacrificing performance or speed.
Inform → Optimize → Operate
Inform: Visibility — who spends what, where, why
Optimize: Rightsizing, reservations, waste elimination
Operate: Continuous governance, anomaly detection, forecasting
| Role | Responsibility |
|---|---|
| Engineering | Architecting cost-efficient solutions, tagging resources, rightsizing |
| Finance | Budgeting, forecasting, chargeback to business units |
| Leadership | Setting targets, approving commitments (Reserved Instances, Savings Plans) |
| Platform/FinOps | Tooling, visibility dashboards, anomaly alerts |
Cost per Customer = Total Cloud Cost / Active Customers
Cost per Transaction = Total Cloud Cost / Monthly Transactions
Cost per Feature = Cloud Cost allocated to feature / Feature users
Track these in your observability dashboards alongside latency and error rate.
Define a consistent tagging standard before creating any new resource:
# Terraform — enforce via locals + variable validation
locals {
required_tags = {
project = var.project # e.g., "payments-service"
team = var.team # e.g., "platform-eng"
environment = var.environment # dev | staging | production
service = var.service # e.g., "api-gateway"
owner = var.owner # e.g., "jane.doe@company.com"
cost-center = var.cost_center # e.g., "CC-12345"
}
}
resource "aws_instance" "app" {
ami = var.ami_id
instance_type = var.instance_type
tags = merge(local.required_tags, var.extra_tags)
}
# AWS Config Rule — enforce required tags
resource "aws_config_config_rule" "required_tags" {
name = "required-tags"
source {
owner = "AWS"
source_identifier = "REQUIRED_TAGS"
}
input_parameters = jsonencode({
tag1Key = "project"
tag2Key = "team"
tag3Key = "environment"
tag4Key = "owner"
})
}
# GCP Organization Policy — enforce labels
name: projects/my-project/policies/gcp.resourceLocations
spec:
rules:
- condition:
expression: "resource.matchedTagValue('team') == ''"
deny: {}
| Model | Description | When to Use |
|---|---|---|
| Showback | Show teams their costs informally | Early stage, build awareness |
| Chargeback | Allocate costs to team P&L | Mature FinOps, cost accountability needed |
Infracost estimates the monthly cost of Terraform changes and posts a diff to PRs.
# Install Infracost
brew install infracost
infracost auth login
# Estimate cost for current Terraform config
infracost breakdown --path=.
# Generate diff against baseline (e.g., main branch)
infracost diff --path=. --compare-to=/tmp/infracost-base.json
# .github/workflows/infracost.yml
name: Infracost
on:
pull_request:
paths:
- '**/*.tf'
- '**/*.tfvars'
jobs:
infracost:
name: Cost Estimate
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Setup Infracost
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Generate base cost (main branch)
run: |
git fetch origin main
git stash
git checkout origin/main
infracost breakdown --path=. \
--format=json \
--out-file=/tmp/infracost-base.json
git checkout -
- name: Generate PR cost diff
run: |
infracost diff \
--path=. \
--format=json \
--compare-to=/tmp/infracost-base.json \
--out-file=/tmp/infracost-diff.json
- name: Post comment to PR
run: |
infracost comment github \
--path=/tmp/infracost-diff.json \
--repo=$GITHUB_REPOSITORY \
--github-token=${{ secrets.GITHUB_TOKEN }} \
--pull-request=${{ github.event.pull_request.number }} \
--behavior=update
- name: Fail if cost increase > 20%
run: |
PERCENT=$(infracost output \
--path=/tmp/infracost-diff.json \
--format=json | \
jq '.diffTotalMonthlyCost / .pastTotalMonthlyCost * 100')
if (( $(echo "$PERCENT > 20" | bc -l) )); then
echo "Cost increase of $PERCENT% exceeds threshold!"
exit 1
fi
Monthly cost estimate:
aws_db_instance.main $146.00/month (+$73.00 vs main)
aws_instance.app[0] $72.00/month (no change)
aws_elasticache_cluster.redis $24.00/month (+$24.00 new resource)
Total $242.00/month (+$97.00, +67%)
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set opencost.prometheus.internal.enabled=true
# Cost per namespace over last 24h
curl "http://localhost:9003/allocation?window=24h&aggregate=namespace" | jq
# Cost per deployment label
curl "http://localhost:9003/allocation?window=7d&aggregate=label:app" | jq
# Cost breakdown: CPU + Memory + Storage
curl "http://localhost:9003/allocation?window=30d&aggregate=controller" | jq '
.data[] | {
name: .name,
cpuCost: .cpuCost,
ramCost: .ramCost,
pvCost: .pvCost,
totalCost: .totalCost
}
'
# Import dashboard ID 12465 from grafana.com for OpenCost
# Or build custom panels:
# Panel: Top 10 most expensive namespaces
# PromQL:
sum by (namespace) (
label_replace(opencost_allocation_namespace_cost_total, "namespace", "$1", "namespace", "(.*)")
) / 30
# Shows daily average cost per namespace over last 30 days
| Feature | OpenCost (OSS) | Kubecost (Commercial) |
|---|---|---|
| Cost attribution | Yes | Yes |
| Multi-cluster | Manual | Built-in |
| Savings recommendations | No | Yes |
| Alerts | No | Yes |
| Price: | Free | $0–$699+/cluster/month |
# AWS Compute Optimizer recommendations (CLI)
aws compute-optimizer get-ec2-instance-recommendations \
--filters name=Finding,values=OVER_PROVISIONED \
--output table
# Kubernetes — find pods with high slack (request >> actual usage)
kubectl top pods -A | awk '{
if ($3+0 < $2*0.2) print $0, "OVER_PROVISIONED"
}'
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # "Off" = recommendations only, no auto-apply
resourcePolicy:
containerPolicies:
- containerName: my-app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 4Gi
# View VPA recommendations
kubectl describe vpa my-app-vpa | grep -A 20 "Recommendation"
Rule of thumb:
- Baseline steady-state workload (always running) → Reserved Instances (1yr) = ~40% savings
- Variable but predictable → Compute Savings Plans = ~30% savings
- Batch / fault-tolerant → Spot / Preemptible = ~60-90% savings
Analysis:
- AWS: Cost Explorer → Savings Plans → Recommendations
- GCP: Committed Use Discounts (CUD) Analyzer
- Azure: Azure Advisor → Cost recommendations
# Lifecycle policy — move to cheaper tiers automatically
import boto3
s3 = boto3.client('s3')
s3.put_bucket_lifecycle_configuration(
Bucket='my-bucket',
LifecycleConfiguration={
'Rules': [{
'ID': 'cost-optimization',
'Status': 'Enabled',
'Transitions': [
{'Days': 30, 'StorageClass': 'STANDARD_IA'}, # -40% cost
{'Days': 90, 'StorageClass': 'GLACIER_IR'}, # -68% cost
{'Days': 365, 'StorageClass': 'DEEP_ARCHIVE'}, # -95% cost
],
'NoncurrentVersionExpiration': {'NoncurrentDays': 90},
}],
}
)
Key rules:
1. Keep compute in the same AZ/region as the data it processes
2. Use VPC endpoints (PrivateLink) for AWS services — eliminates NAT Gateway egress
3. CDN for static assets — S3 → CloudFront reduces origin egress by 60-90%
4. Compress API responses (gzip/brotli) — reduces data transfer volume
5. Same-region replicas preferred over cross-region for frequently accessed data
resource "aws_ce_anomaly_monitor" "service_monitor" {
name = "service-anomaly-monitor"
monitor_type = "DIMENSIONAL"
monitor_dimension = "SERVICE"
}
resource "aws_ce_anomaly_subscription" "alert" {
name = "cost-anomaly-alert"
threshold = 10 # Alert if anomaly > $10
frequency = "DAILY"
monitor_arn_list = [aws_ce_anomaly_monitor.service_monitor.arn]
subscriber {
type = "SNS"
address = aws_sns_topic.cost_alerts.arn
}
}
resource "google_billing_budget" "monthly_budget" {
billing_account = var.billing_account_id
display_name = "Monthly Budget Alert"
amount {
specified_amount {
currency_code = "USD"
units = "10000" # $10,000 budget
}
}
threshold_rules {
threshold_percent = 0.5 # Alert at 50%
spend_basis = "CURRENT_SPEND"
}
threshold_rules {
threshold_percent = 0.8 # Alert at 80%
}
threshold_rules {
threshold_percent = 1.0 # Alert at 100%
spend_basis = "FORECASTED_SPEND"
}
all_updates_rule {
pubsub_topic = google_pubsub_topic.budget_alerts.id
}
}
| Level | Characteristics |
|---|---|
| Crawl | Basic cost visibility, some tagging, monthly reviews |
| Walk | Infracost in CI, cost per team, anomaly alerts, rightsizing recommendations |
| Run | Unit economics, automated rightsizing, showback/chargeback, savings plan governance |
project, team, environment, owner