From castai-pack
Optimize CAST AI autoscaler performance, node provisioning speed, and API efficiency. Use when nodes take too long to provision, autoscaler is not reacting fast enough, or optimizing API call patterns for multi-cluster dashboards. Trigger with phrases like "cast ai performance", "cast ai slow", "cast ai node provisioning", "cast ai autoscaler speed".
npx claudepluginhub flight505/skill-forge --plugin castai-packThis skill is limited to using the following tools:
Tune CAST AI for faster node provisioning, more responsive autoscaling, and efficient API usage. Covers headroom configuration, instance family selection, and API caching for multi-cluster dashboards.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Tune CAST AI for faster node provisioning, more responsive autoscaling, and efficient API usage. Covers headroom configuration, instance family selection, and API caching for multi-cluster dashboards.
# Configure headroom for proactive scaling (avoids waiting for pending pods)
curl -X PUT -H "X-API-Key: ${CASTAI_API_KEY}" \
-H "Content-Type: application/json" \
"https://api.cast.ai/v1/kubernetes/clusters/${CASTAI_CLUSTER_ID}/policies" \
-d '{
"enabled": true,
"unschedulablePods": {
"enabled": true,
"headroom": {
"enabled": true,
"cpuPercentage": 15,
"memoryPercentage": 15
}
}
}'
Headroom pre-provisions spare capacity so pods schedule immediately instead of waiting 2-5 minutes for new nodes.
# Terraform: Prefer instance families with fast launch times
resource "castai_node_template" "fast_launch" {
cluster_id = castai_eks_cluster.this.id
name = "fast-launch-workers"
constraints {
spot = true
use_spot_fallbacks = true
fallback_restore_rate_seconds = 300
# Newer instance types launch faster and have better availability
instance_families {
include = ["m6i", "m7i", "c6i", "c7i", "r6i", "r7i"]
}
# Enable spot diversity for faster provisioning
spot_diversity_price_increase_limit_percent = 25
architectures = ["amd64"]
}
}
# Reduce empty node delay for dev/staging (faster downscale)
helm upgrade castai-evictor castai-helm/castai-evictor \
-n castai-agent \
--reuse-values \
--set evictor.aggressiveMode=true \
--set evictor.cycleInterval=120
# For production, use non-aggressive with longer intervals
# --set evictor.aggressiveMode=false
# --set evictor.cycleInterval=600
import { LRUCache } from "lru-cache";
const cache = new LRUCache<string, unknown>({ max: 100, ttl: 60_000 });
interface ClusterSummary {
id: string;
name: string;
savings: number;
savingsPercent: number;
nodeCount: number;
spotPercent: number;
}
async function getClusterSummary(clusterId: string): Promise<ClusterSummary> {
const cacheKey = `summary:${clusterId}`;
const cached = cache.get(cacheKey) as ClusterSummary | undefined;
if (cached) return cached;
const [cluster, savings, nodes] = await Promise.all([
castaiGet(`/v1/kubernetes/external-clusters/${clusterId}`),
castaiGet(`/v1/kubernetes/clusters/${clusterId}/savings`),
castaiGet(`/v1/kubernetes/external-clusters/${clusterId}/nodes`),
]);
const spotNodes = nodes.items.filter(
(n: { lifecycle: string }) => n.lifecycle === "spot"
).length;
const summary: ClusterSummary = {
id: clusterId,
name: cluster.name,
savings: savings.monthlySavings,
savingsPercent: savings.savingsPercentage,
nodeCount: nodes.items.length,
spotPercent: nodes.items.length > 0
? (spotNodes / nodes.items.length) * 100
: 0,
};
cache.set(cacheKey, summary);
return summary;
}
// Aggregate across all clusters
async function getDashboardData(
clusterIds: string[]
): Promise<ClusterSummary[]> {
return Promise.all(clusterIds.map(getClusterSummary));
}
# Faster resource adjustment with shorter cooldown
# (use with caution in production)
metadata:
annotations:
autoscaling.cast.ai/cpu-headroom: "10" # Lower headroom = tighter fit
autoscaling.cast.ai/memory-headroom: "15"
autoscaling.cast.ai/apply-type: "immediate" # Apply without waiting
| Metric | Default | Tuned |
|---|---|---|
| Node provision time | 3-5 min | 1-3 min (with headroom) |
| Empty node removal | 5 min | 2 min (aggressive evictor) |
| Workload resize | 5 min cooldown | Immediate |
| API response (cached) | 200ms | <5ms |
| Issue | Cause | Solution |
|---|---|---|
| Headroom over-provisioning | Percentage too high | Reduce to 5-10% |
| Aggressive evictor causing disruptions | PDB not set | Add PodDisruptionBudgets |
| Cache stale data | TTL too long | Reduce cache TTL to 30s |
| Instance type unavailable | Too narrow constraints | Add more instance families |
For cost optimization strategies, see castai-cost-tuning.