From gcp-iot
This skill provides Cloud Run debugging techniques and troubleshooting patterns. Use when debugging Cloud Run services, investigating errors, analyzing logs, or resolving deployment issues. Triggers when user mentions "Cloud Run error", "container crash", "cold start", "timeout", "502 error", "deployment failed", or needs help with Cloud Run troubleshooting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gcp-iot:cloud-run-debuggingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```bash
# List all services with status
gcloud run services list --project=$PROJECT_ID
# Get detailed service info
gcloud run services describe $SERVICE --region=$REGION --project=$PROJECT_ID
# Check current revision health
gcloud run revisions list --service=$SERVICE --region=$REGION --limit=5
# Recent errors
gcloud logging read "resource.type=cloud_run_revision AND severity>=ERROR" \
--limit=50 --project=$PROJECT_ID \
--format="table(timestamp,severity,textPayload)"
# Specific service logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=$SERVICE" \
--limit=100 --project=$PROJECT_ID
# Request failures (4xx/5xx)
gcloud logging read "resource.type=cloud_run_revision AND httpRequest.status>=400" \
--limit=20 --project=$PROJECT_ID \
--format="table(timestamp,httpRequest.status,httpRequest.requestUrl,httpRequest.latency)"
Symptoms: First request takes 5-30+ seconds
Diagnosis:
# Check if min instances is 0
gcloud run services describe $SERVICE --format="yaml(spec.template.metadata.annotations)"
Solutions:
# Set minimum instances
gcloud run services update $SERVICE --min-instances=1 --region=$REGION
# Optimize container startup
# - Use smaller base images (alpine, distroless)
# - Lazy load heavy dependencies
# - Reduce container size
Code Optimization:
// Lazy load heavy modules
let heavyModule;
function getHeavyModule() {
if (!heavyModule) {
heavyModule = require('heavy-module');
}
return heavyModule;
}
Symptoms: Service unavailable, container restarts
Diagnosis:
# Check for crash logs
gcloud logging read "resource.type=cloud_run_revision AND textPayload:crash OR textPayload:killed OR textPayload:OOM" \
--limit=20 --project=$PROJECT_ID
# Check memory usage
gcloud logging read "resource.type=cloud_run_revision AND textPayload:memory" \
--limit=10 --project=$PROJECT_ID
Common Causes:
process.env.PORTSolutions:
# Increase memory
gcloud run services update $SERVICE --memory=1Gi --region=$REGION
// Ensure PORT handling
const PORT = process.env.PORT || 8080;
app.listen(PORT, () => console.log(`Listening on ${PORT}`));
// Add global error handler
process.on('uncaughtException', (err) => {
console.error('Uncaught exception:', err);
process.exit(1);
});
Symptoms: Requests fail after 60 seconds (default timeout)
Diagnosis:
# Check current timeout
gcloud run services describe $SERVICE --format="value(spec.template.spec.timeoutSeconds)"
# Find slow requests
gcloud logging read "resource.type=cloud_run_revision AND httpRequest.latency>5s" \
--limit=20 --project=$PROJECT_ID
Solutions:
# Increase timeout (max 3600s)
gcloud run services update $SERVICE --timeout=300 --region=$REGION
Better Approach: Use async processing
// Return quickly, process async
app.post('/long-task', async (req, res) => {
const taskId = generateTaskId();
// Acknowledge immediately
res.status(202).json({ taskId, status: 'processing' });
// Process in background (use Pub/Sub or Cloud Tasks)
await pubsub.topic('tasks').publish(Buffer.from(JSON.stringify(req.body)));
});
Symptoms: "IAM permission denied" or 401 responses
Diagnosis:
# Check service authentication settings
gcloud run services describe $SERVICE --format="yaml(spec.template.metadata.annotations.'run.googleapis.com/ingress')"
# Check IAM bindings
gcloud run services get-iam-policy $SERVICE --region=$REGION
Solutions:
# Allow unauthenticated (public API)
gcloud run services add-iam-policy-binding $SERVICE \
--member="allUsers" \
--role="roles/run.invoker" \
--region=$REGION
# Or for authenticated with specific service account
gcloud run services add-iam-policy-binding $SERVICE \
--member="serviceAccount:client@project.iam.gserviceaccount.com" \
--role="roles/run.invoker" \
--region=$REGION
Symptoms: Deploy command fails
Diagnosis:
# Check Cloud Build logs
gcloud builds list --limit=5
gcloud builds describe $BUILD_ID
Common Causes:
Debug Locally:
# Build locally to test
docker build -t test-image .
docker run -p 8080:8080 -e PORT=8080 test-image
curl http://localhost:8080/health
Symptoms: Can't connect to other services, VPC issues
Diagnosis:
# Check VPC connector
gcloud run services describe $SERVICE \
--format="yaml(spec.template.metadata.annotations.'run.googleapis.com/vpc-access-connector')"
# Check egress settings
gcloud run services describe $SERVICE \
--format="yaml(spec.template.metadata.annotations.'run.googleapis.com/vpc-access-egress')"
Solutions:
# Add VPC connector for internal resources
gcloud run services update $SERVICE \
--vpc-connector=my-connector \
--vpc-egress=private-ranges-only \
--region=$REGION
const winston = require('winston');
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [new winston.transports.Console()]
});
// Structured log that Cloud Logging understands
logger.info('Request received', {
requestId: req.id,
path: req.path,
method: req.method,
userId: req.userId
});
app.use((req, res, next) => {
// Extract Cloud Trace ID
const traceHeader = req.headers['x-cloud-trace-context'];
if (traceHeader) {
const [traceId] = traceHeader.split('/');
req.traceId = traceId;
// Include in all logs
console.log(JSON.stringify({
severity: 'INFO',
message: 'Request received',
'logging.googleapis.com/trace': `projects/${PROJECT_ID}/traces/${traceId}`
}));
}
next();
});
// Comprehensive health check
app.get('/health', async (req, res) => {
const health = {
status: 'healthy',
timestamp: Date.now(),
uptime: process.uptime(),
memory: process.memoryUsage(),
checks: {}
};
try {
// Check database
await db.collection('health').doc('check').get();
health.checks.database = 'ok';
} catch (err) {
health.checks.database = 'failed';
health.status = 'unhealthy';
}
try {
// Check Pub/Sub
await pubsub.topic('telemetry').get();
health.checks.pubsub = 'ok';
} catch (err) {
health.checks.pubsub = 'failed';
health.status = 'unhealthy';
}
const statusCode = health.status === 'healthy' ? 200 : 503;
res.status(statusCode).json(health);
});
# Check current concurrency
gcloud run services describe $SERVICE --format="value(spec.template.spec.containerConcurrency)"
# Adjust based on workload (default 80)
gcloud run services update $SERVICE --concurrency=100
# CPU always allocated (reduces cold start impact)
gcloud run services update $SERVICE --cpu-boost --region=$REGION
# Or always-on CPU
gcloud run services update $SERVICE --no-cpu-throttling --region=$REGION
| Issue | Quick Fix Command |
|---|---|
| Cold starts | --min-instances=1 |
| OOM crashes | --memory=1Gi |
| Timeouts | --timeout=300 |
| Slow scaling | --max-instances=10 |
| CPU throttling | --no-cpu-throttling |
| Auth errors | --allow-unauthenticated |
npx claudepluginhub maxcogar/agent-armory --plugin gcp-iotProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.