Auto-activate for kubectl commands, k8s/ directory, Helm charts. Kubernetes on GCP expertise for GKE. Produces Kubernetes deployments, Helm charts, cluster configurations, GPU/TPU workloads, AlloyDB/Cloud SQL Auth Proxy sidecars, and batch job patterns for GKE on GCP. Use when: running kubectl, Helm charts, pod/node pool management, workload identity, Kubernetes deployments, cluster scaling, GPU node pools, database sidecars, or any GKE troubleshooting. Not for Cloud Run (see cloud-run), generic Kubernetes outside GCP, or local k8s (minikube/kind).
From flownpx claudepluginhub cofin/flow --plugin flowThis skill uses the workspace's default tool permissions.
references/alloydb-on-gke.mdreferences/autoscaling.mdreferences/batch-workloads.mdreferences/cloudsql-on-gke.mdreferences/cluster.mdreferences/gpu.mdreferences/helm_deployment.mdreferences/kubectl.mdreferences/networking.mdreferences/node_pools.mdreferences/saq_workers.mdreferences/security.mdreferences/terraform.mdreferences/troubleshooting.mdreferences/workload_identity.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
GKE is Google Cloud's managed Kubernetes service, handling cluster management, upgrades, scaling, GPU workloads, and production database connectivity via Auth Proxy sidecars.
resources:
limits:
nvidia.com/gpu: "1" # GPU in limits ONLY — never in requests
Add toleration for tainted GPU nodes:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# 1. Annotate the KSA with the GCP SA email
kubectl annotate serviceaccount KSA_NAME \
--namespace=NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
# 2. Bind GCP SA to allow KSA impersonation
gcloud iam service-accounts add-iam-policy-binding \
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
- name: alloydb-auth-proxy
image: gcr.io/alloydb-connectors/alloydb-auth-proxy:latest
args:
- "projects/PROJECT_ID/locations/REGION/clusters/CLUSTER/instances/INSTANCE"
- "--port=5432"
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
runAsUser: 65532
capabilities:
drop: [ALL]
See alloydb-on-gke.md for the full production pattern.
# Cluster access
gcloud container clusters get-credentials CLUSTER --region=REGION
kubectl config use-context CONTEXT_NAME
# Core operations
kubectl get nodes
kubectl get pods -A
kubectl logs -f POD_NAME -n NAMESPACE
kubectl exec -it POD_NAME -n NAMESPACE -- /bin/sh
kubectl apply -f manifest.yaml
kubectl apply or Helm chart with per-component values (web, workers).kubectl logs, kubectl describe, kubectl top.chart/
Chart.yaml
values.yaml
templates/
_helpers.tpl
web-deployment.yaml
web-service.yaml
worker-deployment.yaml
migration-job.yaml
Structure values.yaml with separate sections per component (web, workers), each specifying replicaCount, image, command, resources, and port.
Connect to AlloyDB via the Auth Proxy sidecar + Workload Identity. The proxy runs as a sidecar and listens on localhost:5432. Application connects to postgresql://user:password@localhost:5432/dbname.
Key roles for GSA: roles/alloydb.client, roles/secretmanager.secretAccessor, roles/storage.objectAdmin, roles/logging.logWriter.
See alloydb-on-gke.md for full deployment, HPA with queue-depth metrics, CronJob queue monitor, and Job patterns.
Connect to Cloud SQL via the cloud-sql-proxy sidecar. Same Workload Identity pattern; GSA needs roles/cloudsql.client.
See cloudsql-on-gke.md for pod spec and connection string format.
| GPU Type | Machine Series | Notes |
|---|---|---|
| NVIDIA T4 | N1 | Cost-effective inference |
| NVIDIA L4 | G2 | Efficient inference/fine-tuning |
| NVIDIA A100 (40/80GB) | A2 | Large-scale training, MIG support |
| NVIDIA H100 (80GB) | A3 | Highest throughput, MIG support |
Autopilot GPU: automatic driver install, pay-per-pod billing, MIG enabled by default (v1.29.3+). Simpler operations.
Standard GPU: manual driver install via DaemonSet or GPU Operator (helm install gpu-operator nvidia/gpu-operator). Full node control.
# Minimal GPU pod spec
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
containers:
- name: trainer
image: nvcr.io/nvidia/pytorch:24.01-py3
resources:
limits:
nvidia.com/gpu: "1" # GPU in limits only; limits == requests for GPU
See gpu.md for time-sharing, MIG, NAP, Spot GPU, and TPU patterns.
Choose Autopilot (Google-managed nodes, pay-per-pod) or Standard (full node control). Use regional clusters for production HA. Enable Workload Identity at cluster creation.
# Create GSA + grant permissions
gcloud iam service-accounts create GSA_NAME
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:GSA_NAME@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.admin"
# Create KSA + bind to GSA
kubectl create serviceaccount KSA_NAME --namespace NAMESPACE
gcloud iam service-accounts add-iam-policy-binding \
GSA_NAME@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/KSA_NAME]"
# Annotate KSA
kubectl annotate serviceaccount KSA_NAME \
--namespace=NAMESPACE \
iam.gke.io/gcp-service-account=GSA_NAME@PROJECT_ID.iam.gserviceaccount.com
Apply manifests or install Helm chart. Set resource requests/limits on every container. Add PodDisruptionBudgets for availability during upgrades.
Run kubectl get pods -n NAMESPACE to confirm healthy rollout. Check logs and events for errors.
nvidia.com/gpu in requests; limits implicitly equal requests for GPU resources.nvidia.com/gpu=present:NoSchedule to prevent non-GPU pods from landing on expensive GPU nodes.runAsNonRoot: true, runAsUser: 65532, runAsGroup: 65532, fsGroup: 65532, allowPrivilegeEscalation: false, capabilities.drop: [ALL].Before delivering GKE configurations, verify:
limits only (not requests)nvidia.com/gpu=present:NoSchedule taintrunAsNonRoot: true, runAsUser: 65532, capabilities.drop: [ALL]Task: Deploy a web application with a Service on GKE.
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
serviceAccountName: web-app-ksa # Workload Identity KSA
containers:
- name: web
image: us-central1-docker.pkg.dev/my-project/repo/web-app:v1.2.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: "1"
memory: 1Gi
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
---
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: web-app
namespace: production
spec:
selector:
app: web-app
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app
</example>
No Gemini CLI extension exists for GKE -- this skill provides unique value for GKE cluster management, GPU workloads, and production database connectivity patterns.
For detailed guides and configuration examples, refer to the following documents in references/: