Auto-activate for Cloud Run service.yaml, gcloud run commands. Google Cloud Run serverless platform: Dockerfile, containerized services, Cloud Run Jobs, cold starts, traffic splitting. Produces Cloud Run service configurations, Dockerfiles, and deployment workflows for containerized serverless apps on GCP. Use when: deploying containers to Cloud Run, writing Dockerfiles for serverless, or tuning scaling/concurrency. Not for GKE (see gke), Cloud Functions, or non-containerized deployments.
From flownpx claudepluginhub cofin/flow --plugin flowThis skill uses the workspace's default tool permissions.
references/cloudbuild.mdreferences/dockerfile.mdreferences/gpu.mdreferences/iap.mdreferences/jobs.mdreferences/networking.mdreferences/performance.mdreferences/services.mdreferences/terraform.mdreferences/troubleshooting.mdreferences/volumes.mdSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Translates visa document images to English via OCR (Vision/EasyOCR/Tesseract), rotates via EXIF, and generates bilingual A4 PDFs with original and translation.
Cloud Run is a fully managed serverless platform for running containerized applications. It automatically scales from zero to N based on incoming requests and charges only for resources used during request processing.
gcloud builds submit --tag gcr.io/PROJECT/IMAGE:TAGgcloud run deploy SERVICE --image=IMAGE_URL --region=REGIONgcloud run services update-traffic SERVICE --to-latest| Setting | Flag | Recommendation |
|---|---|---|
| CPU | --cpu=N | 1-8 vCPUs; start with 1 |
| Memory | --memory=NGi | 256Mi-32Gi; match to workload |
| Concurrency | --concurrency=N | 80 default; lower for memory-heavy handlers |
| Min instances | --min-instances=N | 1+ for production to avoid cold starts |
| Max instances | --max-instances=N | Set a ceiling to control costs |
| Timeout | --timeout=N | Up to 3600s for services, 86400s for jobs |
| CPU allocation | --cpu-throttling=false | Use for WebSockets, background tasks |
| Feature | Services | Jobs |
|---|---|---|
| Purpose | HTTP request handling | Batch/scheduled tasks |
| Scaling | Auto-scales with traffic | Runs to completion |
| Timeout | Up to 60 minutes | Up to 24 hours |
| Command | gcloud run deploy | gcloud run jobs deploy |
gcloud run deploy SERVICE \
--gpu=1 \
--gpu-type=nvidia-l4 \
--cpu=8 \
--memory=32Gi \
--concurrency=4
Minimum: 4 CPU, 16 GiB. Recommended: 8 CPU, 32 GiB. Set --concurrency explicitly — no GPU-based autoscaling. See references/gpu.md for RTX PRO 6000 Blackwell, driver details, and ML inference patterns.
Direct VPC Egress — route to AlloyDB/Cloud SQL private IPs without VPC connector overhead:
gcloud run deploy SERVICE \
--vpc-egress=private-ranges-only \
--network=NETWORK \
--subnet=SUBNET
Secret mounting:
--set-secrets=KEY=SECRET_NAME:latest
Env var separator trick — use ^||^ when values contain commas (e.g., JSON arrays in CORS origins):
--set-env-vars=^||^CORS_ORIGINS=["https://app.example.com","https://api.example.com"]||OTHER_KEY=value
CORS origin reconciliation workflow:
gcloud run services describe)gcloud secrets versions add SECRET_NAME --data-file=-IAP setup summary:
gcloud iap oauth-brands create --application_title=APP --support_email=EMAILgcloud projects add-iam-policy-binding PROJECT --member=serviceAccount:service-PROJECT@gcp-sa-iap.iam.gserviceaccount.com --role=roles/run.invoker--member=user:EMAIL --role=roles/iap.httpsResourceAccessorroles/iap.httpsResourceAccessor before enabling IAPSee references/iap.md for full IAP configuration.
<workflow>Use multi-stage builds (base, builder, runner). Install dependencies in the builder stage, copy only the runtime artifacts to the runner stage. Run as a non-root user. Use tini as PID 1 for proper signal handling.
Use Cloud Build (gcloud builds submit) or a CI pipeline to build and push to Artifact Registry or Container Registry. Tag images with the git SHA for traceability.
Deploy with gcloud run deploy, setting CPU, memory, concurrency, and min/max instances. Use --no-traffic for initial test deployments, then shift traffic with --to-latest or percentage-based splits.
Use --allow-unauthenticated for public APIs. For internal services, use IAM-based auth. Set up IAP (Identity-Aware Proxy) for user-facing apps that need Google login. Use VPC Connector for access to private resources.
Set --min-instances=1 in production. Enable --cpu-boost for faster startup. Lazy-load heavy dependencies in application code. Pre-compile bytecode for Python.
--min-instances=1 for latency-sensitive production services; use --cpu-boost for faster startup--max-instances to prevent runaway scaling and unexpected billing spikes--concurrency to match your application's per-instance capacity — too high causes memory pressure, too low wastes resources--vpc-egress=private-ranges-only gives direct routing to AlloyDB/Cloud SQL private IPs with lower latency and no connector overhead--concurrency explicitly for GPU workloads — Cloud Run cannot auto-scale on GPU utilization; the default of 80 will OOM a GPU instanceBefore delivering configurations, verify:
--memory and --cpu are explicitly set in the deploy command--min-instances is set for production services--max-instances is set to prevent unbounded scaling--allow-unauthenticated)Minimal Dockerfile and deploy command for a Python web service:
# Dockerfile
FROM python:3.13-slim-bookworm AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN uv sync --frozen --no-dev --no-install-project
COPY src ./src
RUN uv sync --frozen --no-dev
FROM python:3.13-slim-bookworm AS runner
RUN apt-get update && apt-get install -y --no-install-recommends tini \
&& rm -rf /var/lib/apt/lists/*
RUN useradd --create-home appuser
USER appuser
COPY --from=builder /app /app
ENV PATH="/app/.venv/bin:$PATH"
ENTRYPOINT ["tini", "--"]
CMD ["uvicorn", "myapp.main:app", "--host", "0.0.0.0", "--port", "8080"]
EXPOSE 8080
Deploy command:
# Build and push
gcloud builds submit --tag gcr.io/my-project/myapp:latest
# Deploy with production settings
gcloud run deploy myapp \
--image=gcr.io/my-project/myapp:latest \
--region=us-central1 \
--cpu=1 \
--memory=512Mi \
--concurrency=80 \
--min-instances=1 \
--max-instances=10 \
--cpu-boost \
--service-account=myapp-sa@my-project.iam.gserviceaccount.com \
--allow-unauthenticated
</example>
Note: No Gemini CLI extension exists for Cloud Run — this skill provides unique value for Cloud Run deployments, GPU workloads, and production networking patterns not covered by other tooling.
For detailed guides and configuration examples, refer to the following documents in references/: