Skill

cloud-run

Auto-activate for Cloud Run service.yaml, gcloud run commands. Google Cloud Run serverless platform: Dockerfile, containerized services, Cloud Run Jobs, cold starts, traffic splitting. Produces Cloud Run service configurations, Dockerfiles, and deployment workflows for containerized serverless apps on GCP. Use when: deploying containers to Cloud Run, writing Dockerfiles for serverless, or tuning scaling/concurrency. Not for GKE (see gke), Cloud Functions, or non-containerized deployments.

From flow

Install

Run in your terminal

npx claudepluginhub cofin/flow --plugin flow

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/cloudbuild.md

references/dockerfile.md

references/gpu.md

references/iap.md

references/jobs.md

references/networking.md

references/performance.md

references/services.md

references/terraform.md

references/troubleshooting.md

references/volumes.md

Skill Content

Similar Skills

skill-lookup

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

157.5k

prompt-lookup

Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.

prompts.chat

157.5k

visa-doc-translate

1 file

Translates visa document images to English via OCR (Vision/EasyOCR/Tesseract), rotates via EXIF, and generates bilingual A4 PDFs with original and translation.

everything-claude-code

139.9k

Stats

Stars9

Forks4

Last CommitApr 4, 2026

Actions

View Source View Plugin View on GitHub View README

Google Cloud Run Skill

Overview

Cloud Run is a fully managed serverless platform for running containerized applications. It automatically scales from zero to N based on incoming requests and charges only for resources used during request processing.

Quick Reference

Deployment Pipeline

Write Dockerfile — multi-stage build with non-root user
Build image — gcloud builds submit --tag gcr.io/PROJECT/IMAGE:TAG
Deploy service — gcloud run deploy SERVICE --image=IMAGE_URL --region=REGION
Manage traffic — gcloud run services update-traffic SERVICE --to-latest

Key Service Configuration

Setting	Flag	Recommendation
CPU	`--cpu=N`	1-8 vCPUs; start with 1
Memory	`--memory=NGi`	256Mi-32Gi; match to workload
Concurrency	`--concurrency=N`	80 default; lower for memory-heavy handlers
Min instances	`--min-instances=N`	1+ for production to avoid cold starts
Max instances	`--max-instances=N`	Set a ceiling to control costs
Timeout	`--timeout=N`	Up to 3600s for services, 86400s for jobs
CPU allocation	`--cpu-throttling=false`	Use for WebSockets, background tasks

Services vs Jobs

Feature	Services	Jobs
Purpose	HTTP request handling	Batch/scheduled tasks
Scaling	Auto-scales with traffic	Runs to completion
Timeout	Up to 60 minutes	Up to 24 hours
Command	`gcloud run deploy`	`gcloud run jobs deploy`

GPU (NVIDIA L4)

gcloud run deploy SERVICE \
  --gpu=1 \
  --gpu-type=nvidia-l4 \
  --cpu=8 \
  --memory=32Gi \
  --concurrency=4

Minimum: 4 CPU, 16 GiB. Recommended: 8 CPU, 32 GiB. Set --concurrency explicitly — no GPU-based autoscaling. See references/gpu.md for RTX PRO 6000 Blackwell, driver details, and ML inference patterns.

Production Networking & Secrets

Direct VPC Egress — route to AlloyDB/Cloud SQL private IPs without VPC connector overhead:

gcloud run deploy SERVICE \
  --vpc-egress=private-ranges-only \
  --network=NETWORK \
  --subnet=SUBNET

Secret mounting:

--set-secrets=KEY=SECRET_NAME:latest

Env var separator trick — use ^||^ when values contain commas (e.g., JSON arrays in CORS origins):

--set-env-vars=^||^CORS_ORIGINS=["https://app.example.com","https://api.example.com"]||OTHER_KEY=value

CORS origin reconciliation workflow:

Auto-discover Cloud Run service URL (gcloud run services describe)
Discover GKE LB IP and Cloud Shell preview URLs
Merge with existing allowed origins, deduplicate
Update the secret: gcloud secrets versions add SECRET_NAME --data-file=-

IAP setup summary:

Create OAuth brand: gcloud iap oauth-brands create --application_title=APP --support_email=EMAIL
Grant IAP service identity: gcloud projects add-iam-policy-binding PROJECT --member=serviceAccount:service-PROJECT@gcp-sa-iap.iam.gserviceaccount.com --role=roles/run.invoker
Grant authorized users: --member=user:EMAIL --role=roles/iap.httpsResourceAccessor
Add deployer to prevent lockout: grant deployer roles/iap.httpsResourceAccessor before enabling IAP

See references/iap.md for full IAP configuration.

Workflow

Step 1: Write the Dockerfile

Use multi-stage builds (base, builder, runner). Install dependencies in the builder stage, copy only the runtime artifacts to the runner stage. Run as a non-root user. Use tini as PID 1 for proper signal handling.

Step 2: Build and Push the Image

Use Cloud Build (gcloud builds submit) or a CI pipeline to build and push to Artifact Registry or Container Registry. Tag images with the git SHA for traceability.

Step 3: Deploy the Service

Deploy with gcloud run deploy, setting CPU, memory, concurrency, and min/max instances. Use --no-traffic for initial test deployments, then shift traffic with --to-latest or percentage-based splits.

Step 4: Configure Auth and Networking

Use --allow-unauthenticated for public APIs. For internal services, use IAM-based auth. Set up IAP (Identity-Aware Proxy) for user-facing apps that need Google login. Use VPC Connector for access to private resources.

Step 5: Tune for Cold Starts

Set --min-instances=1 in production. Enable --cpu-boost for faster startup. Lazy-load heavy dependencies in application code. Pre-compile bytecode for Python.

</workflow> <guardrails>

Guardrails

Always set memory and CPU limits — without explicit limits, Cloud Run uses defaults that may not match your workload and can cause OOM kills
Handle cold starts explicitly — set --min-instances=1 for latency-sensitive production services; use --cpu-boost for faster startup
Use IAP for auth (not custom middleware) — Cloud Run's built-in IAP integration eliminates custom auth code; see references/iap.md
Never store state in the container — Cloud Run instances are ephemeral; use Cloud Storage, Firestore, or a database for persistent state
Set --max-instances to prevent runaway scaling and unexpected billing spikes
Use --concurrency to match your application's per-instance capacity — too high causes memory pressure, too low wastes resources
Always use a non-root user in Dockerfiles — Cloud Run supports it and it reduces the blast radius of container escapes
Always use Direct VPC egress (not VPC connector) for private DB access — --vpc-egress=private-ranges-only gives direct routing to AlloyDB/Cloud SQL private IPs with lower latency and no connector overhead
Set --concurrency explicitly for GPU workloads — Cloud Run cannot auto-scale on GPU utilization; the default of 80 will OOM a GPU instance
Download models from GCS, not the container image, for models >10 GB — keeps image build fast and model updates independent of deployments
Use startup probes for slow-starting containers (GPU model loading) — hold traffic until the model is ready; see references/volumes.md

</guardrails> <validation>

Validation Checkpoint

Before delivering configurations, verify:

Dockerfile uses multi-stage build with non-root user
--memory and --cpu are explicitly set in the deploy command
--min-instances is set for production services
--max-instances is set to prevent unbounded scaling
Authentication strategy is defined (IAM, IAP, or --allow-unauthenticated)
Service account is specified (not using the default compute SA)

</validation> <example>

Example

Minimal Dockerfile and deploy command for a Python web service:

# Dockerfile
FROM python:3.13-slim-bookworm AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project
COPY src ./src
RUN uv sync --frozen --no-dev

FROM python:3.13-slim-bookworm AS runner
RUN apt-get update && apt-get install -y --no-install-recommends tini \
    && rm -rf /var/lib/apt/lists/*
RUN useradd --create-home appuser
USER appuser
COPY --from=builder /app /app
ENV PATH="/app/.venv/bin:$PATH"
ENTRYPOINT ["tini", "--"]
CMD ["uvicorn", "myapp.main:app", "--host", "0.0.0.0", "--port", "8080"]
EXPOSE 8080

Deploy command:

# Build and push
gcloud builds submit --tag gcr.io/my-project/myapp:latest

# Deploy with production settings
gcloud run deploy myapp \
    --image=gcr.io/my-project/myapp:latest \
    --region=us-central1 \
    --cpu=1 \
    --memory=512Mi \
    --concurrency=80 \
    --min-instances=1 \
    --max-instances=10 \
    --cpu-boost \
    --service-account=myapp-sa@my-project.iam.gserviceaccount.com \
    --allow-unauthenticated

</example>

Note: No Gemini CLI extension exists for Cloud Run — this skill provides unique value for Cloud Run deployments, GPU workloads, and production networking patterns not covered by other tooling.

References Index

For detailed guides and configuration examples, refer to the following documents in references/:

Services
- Service deployment, CLI commands, traffic management, concurrency, scaling, and resource configuration.
Jobs
- Cloud Run Jobs configuration, execution, and task parallelism.
Performance
- Cold start optimization, resource tuning, concurrency guidelines, and cost/performance best practices.
Terraform Configuration
- IaC examples for services, IAM, and custom domain mappings.
Networking
- Multi-container sidecars, Ingress settings, VPC Connector, and Secrets Management.
Troubleshooting
- Debugging startup failures, latency, memory issues, and security/reliability best practices.
Dockerfile Patterns
- Multi-stage builds, uv package manager, distroless variants, non-root user setup, and tini init system.
Cloud Build
- Cloud Build pipelines, cache warming, multi-target builds, tag strategy, and Artifact Registry push patterns.
Identity-Aware Proxy (IAP)
- IAP setup for Cloud Run, JWT validation, user auto-provisioning, environment variables, gcloud commands, and Terraform configuration.
GPU Support
- NVIDIA L4 and RTX PRO 6000 Blackwell configuration, ML inference best practices, concurrency tuning, and GPU Jobs.
Volumes and Health Checks
- Cloud Storage FUSE mounts, NFS (Filestore), startup probes, and liveness probes.

Official References

Shared Styleguide Baseline

Use shared styleguides for generic language/framework rules to reduce duplication in this skill.
General Principles
GCP Scripting
Bash
Keep this skill focused on tool-specific workflows, edge cases, and integration details.