gcp-cloud-run

Production-ready serverless applications on GCP Cloud Run with containerized services and event-driven functions. Covers Cloud Run Services (containerized web apps and APIs) and Cloud Run Functions (HTTP, Pub/Sub, and Cloud Storage event handlers) Includes cold start optimization techniques: startup CPU boost, minimum instances, distroless images, lazy dependency loading, and memory tuning Provides multi-stage Docker builds, graceful shutdown patterns, and Cloud Build deployment pipelines Documents anti-patterns and sharp edges including /tmp memory limits, concurrency configuration, and CPU throttling during idle periods

INSTALLATION
npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill gcp-cloud-run
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

GCP Cloud Run

Specialized skill for building production-ready serverless applications on GCP.

Covers Cloud Run services (containerized), Cloud Run Functions (event-driven),

cold start optimization, and event-driven architecture with Pub/Sub.

Principles

  • Cloud Run for containers, Functions for simple event handlers
  • Optimize for cold starts with startup CPU boost and min instances
  • Set concurrency based on workload (start with 8, adjust)
  • Memory includes /tmp filesystem - plan accordingly
  • Use VPC Connector only when needed (adds latency)
  • Containers should start fast and be stateless
  • Handle signals gracefully for clean shutdown

Patterns

Cloud Run Service Pattern

Containerized web service on Cloud Run

When to use: Web applications and APIs,Need any runtime or library,Complex services with multiple endpoints,Stateless containerized workloads

# Dockerfile - Multi-stage build for smaller image

FROM node:20-slim AS builder

WORKDIR /app

COPY package*.json ./

RUN npm ci --only=production

FROM node:20-slim

WORKDIR /app

# Copy only production dependencies

COPY --from=builder /app/node_modules ./node_modules

COPY src ./src

COPY package.json ./

# Cloud Run uses PORT env variable

ENV PORT=8080

EXPOSE 8080

# Run as non-root user

USER node

CMD ["node", "src/index.js"]
// src/index.js

const express = require('express');

const app = express();

app.use(express.json());

// Health check endpoint

app.get('/health', (req, res) => {

  res.status(200).send('OK');

});

// API routes

app.get('/api/items/:id', async (req, res) => {

  try {

    const item = await getItem(req.params.id);

    res.json(item);

  } catch (error) {

    console.error('Error:', error);

    res.status(500).json({ error: 'Internal server error' });

  }

});

// Graceful shutdown

process.on('SIGTERM', () => {

  console.log('SIGTERM received, shutting down gracefully');

  server.close(() => {

    console.log('Server closed');

    process.exit(0);

  });

});

const PORT = process.env.PORT || 8080;

const server = app.listen(PORT, () => {

  console.log(`Server listening on port ${PORT}`);

});
# cloudbuild.yaml

steps:

  # Build the container image

  - name: 'gcr.io/cloud-builders/docker'

    args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA', '.']

  # Push the container image

  - name: 'gcr.io/cloud-builders/docker'

    args: ['push', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA']

  # Deploy to Cloud Run

  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'

    entrypoint: gcloud

    args:

      - 'run'

      - 'deploy'

      - 'my-service'

      - '--image=gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'

      - '--region=us-central1'

      - '--platform=managed'

      - '--allow-unauthenticated'

      - '--memory=512Mi'

      - '--cpu=1'

      - '--min-instances=1'

      - '--max-instances=100'

      - '--concurrency=80'

      - '--cpu-boost'

images:

  - 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'

Structure

project/

├── Dockerfile

├── .dockerignore

├── src/

│ ├── index.js

│ └── routes/

├── package.json

└── cloudbuild.yaml

Gcloud_deploy

Direct gcloud deployment

gcloud run deploy my-service

--source .

--region us-central1

--allow-unauthenticated

--memory 512Mi

--cpu 1

--min-instances 1

--max-instances 100

--concurrency 80

--cpu-boost

Cloud Run Functions Pattern

Event-driven functions (formerly Cloud Functions)

When to use: Simple event handlers,Pub/Sub message processing,Cloud Storage triggers,HTTP webhooks

// HTTP Function

// index.js

const functions = require('@google-cloud/functions-framework');

functions.http('helloHttp', (req, res) => {

  const name = req.query.name || req.body.name || 'World';

  res.send(`Hello, ${name}!`);

});
// Pub/Sub Function

const functions = require('@google-cloud/functions-framework');

functions.cloudEvent('processPubSub', (cloudEvent) => {

  // Decode Pub/Sub message

  const message = cloudEvent.data.message;

  const data = message.data

    ? JSON.parse(Buffer.from(message.data, 'base64').toString())

    : {};

  console.log('Received message:', data);

  // Process message

  processMessage(data);

});
// Cloud Storage Function

const functions = require('@google-cloud/functions-framework');

functions.cloudEvent('processStorageEvent', async (cloudEvent) => {

  const file = cloudEvent.data;

  console.log(`Event: ${cloudEvent.type}`);

  console.log(`Bucket: ${file.bucket}`);

  console.log(`File: ${file.name}`);

  if (cloudEvent.type === 'google.cloud.storage.object.v1.finalized') {

    await processUploadedFile(file.bucket, file.name);

  }

});
# Deploy HTTP function

gcloud functions deploy hello-http \

  --gen2 \

  --runtime nodejs20 \

  --trigger-http \

  --allow-unauthenticated \

  --region us-central1

# Deploy Pub/Sub function

gcloud functions deploy process-messages \

  --gen2 \

  --runtime nodejs20 \

  --trigger-topic my-topic \

  --region us-central1

# Deploy Cloud Storage function

gcloud functions deploy process-uploads \

  --gen2 \

  --runtime nodejs20 \

  --trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \

  --trigger-event-filters="bucket=my-bucket" \

  --region us-central1

Cold Start Optimization Pattern

Minimize cold start latency for Cloud Run

When to use: Latency-sensitive applications,User-facing APIs,High-traffic services

1. Enable Startup CPU Boost

gcloud run deploy my-service \

  --cpu-boost \

  --region us-central1

2. Set Minimum Instances

gcloud run deploy my-service \

  --min-instances 1 \

  --region us-central1

3. Optimize Container Image

# Use distroless for minimal image

FROM node:20-slim AS builder

WORKDIR /app

COPY package*.json ./

RUN npm ci --only=production

FROM gcr.io/distroless/nodejs20-debian12

WORKDIR /app

COPY --from=builder /app/node_modules ./node_modules

COPY src ./src

CMD ["src/index.js"]

4. Lazy Initialize Heavy Dependencies

// Lazy load heavy libraries

let bigQueryClient = null;

function getBigQueryClient() {

  if (!bigQueryClient) {

    const { BigQuery } = require('@google-cloud/bigquery');

    bigQueryClient = new BigQuery();

  }

  return bigQueryClient;

}

// Only initialize when needed

app.get('/api/analytics', async (req, res) => {

  const client = getBigQueryClient();

  const results = await client.query({...});

  res.json(results);

});

5. Increase Memory (More CPU)

# Higher memory = more CPU during startup

gcloud run deploy my-service \

  --memory 1Gi \

  --cpu 2 \

  --region us-central1

Optimization_impact

  • Startup_cpu_boost: 50% faster cold starts
  • Min_instances: Eliminates cold starts for traffic spikes
  • Distroless_image: Smaller attack surface, faster pull
  • Lazy_init: Defers heavy loading to first request

Concurrency Configuration Pattern

Proper concurrency settings for Cloud Run

When to use: Need to optimize instance utilization,Handle traffic spikes efficiently,Reduce cold starts

Understanding Concurrency

# Default concurrency is 80

# Adjust based on your workload

# For I/O-bound workloads (most web apps)

gcloud run deploy my-service \

  --concurrency 80 \

  --cpu 1

# For CPU-bound workloads

gcloud run deploy my-service \

  --concurrency 1 \

  --cpu 1

# For memory-intensive workloads

gcloud run deploy my-service \

  --concurrency 10 \

  --memory 2Gi

Node.js Concurrency

// Node.js is single-threaded but handles I/O concurrently

// Use async/await for all I/O operations

// GOOD - async I/O

app.get('/api/data', async (req, res) => {

  const [users, products] = await Promise.all([

    fetchUsers(),

    fetchProducts()

  ]);

  res.json({ users, products });

});

// BAD - blocking operation

app.get('/api/compute', (req, res) => {

  const result = heavyCpuOperation(); // Blocks other requests!

  res.json(result);

});

Python Concurrency with Gunicorn

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# 4 workers for concurrency

CMD exec gunicorn --bind :$PORT --workers 4 --threads 2 main:app
# main.py

from flask import Flask

app = Flask(__name__)

@app.route('/api/data')

def get_data():

    return {'status': 'ok'}

Concurrency_guidelines

  • Concurrency=1: Only for CPU-bound or unsafe code
  • Concurrency=8 20: Memory-intensive workloads
  • Concurrency=80: Default, good for I/O-bound
  • Concurrency=250: Maximum, for very lightweight handlers

Pub/Sub Integration Pattern

Event-driven processing with Cloud Pub/Sub

When to use: Asynchronous message processing,Decoupled microservices,Event-driven architecture

Push Subscription to Cloud Run

# Create topic

gcloud pubsub topics create orders

# Create push subscription to Cloud Run

gcloud pubsub subscriptions create orders-push \

  --topic orders \

  --push-endpoint https://my-service-xxx.run.app/pubsub \

  --ack-deadline 600
// Handle Pub/Sub push messages

const express = require('express');

const app = express();

app.use(express.json());

app.post('/pubsub', async (req, res) => {

  // Verify the request is from Pub/Sub

  if (!req.body.message) {

    return res.status(400).send('Invalid Pub/Sub message');

  }

  try {

    // Decode message data

    const message = req.body.message;

    const data = message.data

      ? JSON.parse(Buffer.from(message.data, 'base64').toString())

      : {};

    console.log('Processing order:', data);

    await processOrder(data);

    // Return 200 to acknowledge

    res.status(200).send('OK');

  } catch (error) {

    console.error('Processing failed:', error);

    // Return 500 to trigger retry

    res.status(500).send('Processing failed');

  }

});

Publishing Messages

const { PubSub } = require('@google-cloud/pubsub');

const pubsub = new PubSub();

async function publishOrder(order) {

  const topic = pubsub.topic('orders');

  const messageBuffer = Buffer.from(JSON.stringify(order));

  const messageId = await topic.publishMessage({

    data: messageBuffer,

    attributes: {

      type: 'order_created',

      priority: 'high'

    }

  });

  console.log(`Published message ${messageId}`);

  return messageId;

}

Dead Letter Queue

# Create DLQ topic

gcloud pubsub topics create orders-dlq

# Update subscription with DLQ

gcloud pubsub subscriptions update orders-push \

  --dead-letter-topic orders-dlq \

  --max-delivery-attempts 5

Cloud SQL Connection Pattern

Connect Cloud Run to Cloud SQL securely

When to use: Need relational database,Migrating existing applications,Complex queries and transactions

# Deploy with Cloud SQL connection

gcloud run deploy my-service \

  --add-cloudsql-instances PROJECT:REGION:INSTANCE \

  --set-env-vars INSTANCE_CONNECTION_NAME="PROJECT:REGION:INSTANCE" \

  --set-env-vars DB_NAME="mydb" \

  --set-env-vars DB_USER="myuser"
// Using Unix socket connection

const { Pool } = require('pg');

const pool = new Pool({

  user: process.env.DB_USER,

  password: process.env.DB_PASS,

  database: process.env.DB_NAME,

  // Cloud SQL connector uses Unix socket

  host: `/cloudsql/${process.env.INSTANCE_CONNECTION_NAME}`,

  max: 5,  // Connection pool size

  idleTimeoutMillis: 30000,

  connectionTimeoutMillis: 10000,

});

app.get('/api/users', async (req, res) => {

  const client = await pool.connect();

  try {

    const result = await client.query('SELECT * FROM users LIMIT 100');

    res.json(result.rows);

  } finally {

    client.release();

  }

});
# Python with SQLAlchemy

import os

from sqlalchemy import create_engine

def get_engine():

    instance_connection_name = os.environ["INSTANCE_CONNECTION_NAME"]

    db_user = os.environ["DB_USER"]

    db_pass = os.environ["DB_PASS"]

    db_name = os.environ["DB_NAME"]

    engine = create_engine(

        f"postgresql+pg8000://{db_user}:{db_pass}@/{db_name}",

        connect_args={

            "unix_sock": f"/cloudsql/{instance_connection_name}/.s.PGSQL.5432"

        },

        pool_size=5,

        max_overflow=2,

        pool_timeout=30,

        pool_recycle=1800,

    )

    return engine

Best_practices

  • Use connection pooling (max 5-10 per instance)
  • Set appropriate idle timeouts
  • Handle connection errors gracefully
  • Consider Cloud SQL Proxy for local development

Secret Manager Integration

Securely manage secrets in Cloud Run

When to use: API keys, database passwords,Service account keys,Any sensitive configuration

# Create secret

echo -n "my-secret-value" | gcloud secrets create my-secret --data-file=-

# Mount as environment variable

gcloud run deploy my-service \

  --update-secrets=API_KEY=my-secret:latest

# Mount as file volume

gcloud run deploy my-service \

  --update-secrets=/secrets/api-key=my-secret:latest
// Access mounted as environment variable

const apiKey = process.env.API_KEY;

// Access mounted as file

const fs = require('fs');

const apiKey = fs.readFileSync('/secrets/api-key', 'utf8');

// Access via Secret Manager API (when not mounted)

const { SecretManagerServiceClient } = require('@google-cloud/secret-manager');

const client = new SecretManagerServiceClient();

async function getSecret(name) {

  const [version] = await client.accessSecretVersion({

    name: `projects/${projectId}/secrets/${name}/versions/latest`

  });

  return version.payload.data.toString();

}

Sharp Edges

/tmp Filesystem Counts Against Memory

Severity: HIGH

Situation: Writing files to /tmp directory in Cloud Run

Symptoms:

Container killed with OOM error.

Memory usage spikes unexpectedly.

File operations cause container restarts.

"Container memory limit exceeded" in logs.

Why this breaks:

Cloud Run uses an in-memory filesystem for /tmp. Any files written

to /tmp consume memory from your container's allocation.

Common scenarios:

  • Downloading files temporarily
  • Creating temp processing files
  • Libraries caching to /tmp
  • Large log buffers

A 512MB container that downloads a 200MB file to /tmp only has

~300MB left for the application.

Recommended fix:

Calculate memory including /tmp usage

# cloudbuild.yaml

steps:

  - name: 'gcr.io/cloud-builders/gcloud'

    args:

      - 'run'

      - 'deploy'

      - 'my-service'

      - '--memory=1Gi'  # Include /tmp overhead

      - '--image=gcr.io/$PROJECT_ID/my-service'

Stream instead of buffering

# BAD - buffers entire file in /tmp

def process_large_file(bucket_name, blob_name):

    blob = bucket.blob(blob_name)

    blob.download_to_filename('/tmp/large_file')

    with open('/tmp/large_file', 'rb') as f:

        process(f.read())

# GOOD - stream processing

def process_large_file(bucket_name, blob_name):

    blob = bucket.blob(blob_name)

    with blob.open('rb') as f:

        for chunk in iter(lambda: f.read(8192), b''):

            process_chunk(chunk)

Use Cloud Storage for large files

from google.cloud import storage

def process_with_gcs(bucket_name, input_blob, output_blob):

    client = storage.Client()

    bucket = client.bucket(bucket_name)

    # Process directly to/from GCS

    input_blob = bucket.blob(input_blob)

    output_blob = bucket.blob(output_blob)

    with input_blob.open('rb') as reader:

        with output_blob.open('wb') as writer:

            for chunk in iter(lambda: reader.read(65536), b''):

                processed = transform(chunk)

                writer.write(processed)

Monitor memory usage

import psutil

import logging

def log_memory():

    memory = psutil.virtual_memory()

    logging.info(f"Memory: {memory.percent}% used, "

                f"{memory.available / 1024 / 1024:.0f}MB available")

Concurrency=1 Causes Scaling Bottlenecks

Severity: HIGH

Situation: Setting concurrency to 1 for request isolation

Symptoms:

Auto-scaling creates many container instances.

High latency during traffic spikes.

Increased cold starts.

Higher costs from more instances.

Why this breaks:

Setting concurrency to 1 means each container handles only one

request at a time. During traffic spikes:

  • 100 concurrent requests = 100 container instances
  • Each instance has cold start overhead
  • More instances = higher costs
  • Scaling takes time, requests queue up

This should only be used when:

  • Processing is truly single-threaded
  • Memory-heavy per-request processing
  • Using thread-unsafe libraries

Recommended fix:

Set appropriate concurrency

# For I/O-bound workloads (most web apps)

gcloud run deploy my-service \

  --concurrency=80 \

  --max-instances=100

# For CPU-bound workloads

gcloud run deploy my-service \

  --concurrency=4 \

  --cpu=2

# Only use 1 when absolutely necessary

gcloud run deploy my-service \

  --concurrency=1 \

  --max-instances=1000  # Be prepared for many instances

Node.js - use async properly

// With high concurrency, ensure async operations

const express = require('express');

const app = express();

app.get('/api/data', async (req, res) => {

  // All I/O should be async

  const data = await fetchFromDatabase();

  const enriched = await enrichData(data);

  res.json(enriched);

});

// Concurrency 80+ is safe for async I/O workloads

Python - use async framework

from fastapi import FastAPI

import asyncio

import httpx

app = FastAPI()

@app.get("/api/data")

async def get_data():

    # Async I/O allows high concurrency

    async with httpx.AsyncClient() as client:

        response = await client.get("https://api.example.com/data")

        return response.json()

# Concurrency 80+ safe with async framework

Calculate concurrency

concurrency = memory_limit / per_request_memory

Example:

- 512MB container

- 20MB per request overhead

- Safe concurrency: ~25

CPU Throttled When Not Handling Requests

Severity: HIGH

Situation: Running background tasks or processing between requests

Symptoms:

Background tasks run extremely slowly.

Scheduled work doesn't complete.

Metrics collection fails.

Connection keep-alive breaks.

Why this breaks:

By default, Cloud Run throttles CPU to near-zero when not actively

handling a request. This is "CPU only during requests" mode.

Affected operations:

  • Background threads
  • Connection pool maintenance
  • Metrics/telemetry emission
  • Scheduled tasks within container
  • Cleanup operations after response

Recommended fix:

Enable CPU always allocated

# CPU allocated even outside requests

gcloud run deploy my-service \

  --cpu-throttling=false \

  --min-instances=1

# Note: This increases costs but enables background work

Use startup CPU boost for initialization

# Boost CPU during cold start only

gcloud run deploy my-service \

  --cpu-boost \

  --cpu-throttling=true  # Default, throttle after request

Move background work to Cloud Tasks

from google.cloud import tasks_v2

import json

def create_background_task(payload):

    client = tasks_v2.CloudTasksClient()

    parent = client.queue_path(

        "my-project", "us-central1", "my-queue"

    )

    task = {

        "http_request": {

            "http_method": tasks_v2.HttpMethod.POST,

            "url": "https://my-service.run.app/process",

            "body": json.dumps(payload).encode(),

            "headers": {"Content-Type": "application/json"}

        }

    }

    client.create_task(parent=parent, task=task)

# Handle response immediately, background via Cloud Tasks

@app.post("/api/order")

async def create_order(order: Order):

    order_id = await save_order(order)

    # Queue background processing

    create_background_task({"order_id": order_id})

    return {"order_id": order_id, "status": "processing"}

Use Pub/Sub for async processing

# Move heavy processing to separate service

steps:

  # Main service - responds quickly

  - name: 'gcr.io/cloud-builders/gcloud'

    args: ['run', 'deploy', 'api-service',

           '--cpu-throttling=true']

  # Worker service - processes messages

  - name: 'gcr.io/cloud-builders/gcloud'

    args: ['run', 'deploy', 'worker-service',

           '--cpu-throttling=false',

           '--min-instances=1']

VPC Connector 10-Minute Idle Timeout

Severity: MEDIUM

Situation: Cloud Run service connecting to VPC resources

Symptoms:

Connection errors after period of inactivity.

"Connection reset" or "Connection refused" errors.

Sporadic failures to VPC resources.

Database connections drop unexpectedly.

Why this breaks:

Cloud Run's VPC connector has a 10-minute idle timeout on connections.

If a connection is idle for 10 minutes, it's silently closed.

Affects:

  • Database connection pools
  • Redis connections
  • Internal API connections
  • Any persistent VPC connection

Recommended fix:

Configure connection pool with keep-alive

# SQLAlchemy with connection recycling

from sqlalchemy import create_engine

engine = create_engine(

    DATABASE_URL,

    pool_size=5,

    max_overflow=2,

    pool_recycle=300,  # Recycle connections every 5 minutes

    pool_pre_ping=True  # Validate connection before use

)

TCP keep-alive for custom connections

import socket

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)

sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)

sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60)

sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)

Redis with connection validation

import redis

pool = redis.ConnectionPool(

    host=REDIS_HOST,

    port=6379,

    socket_keepalive=True,

    socket_keepalive_options={

        socket.TCP_KEEPIDLE: 60,

        socket.TCP_KEEPINTVL: 60,

        socket.TCP_KEEPCNT: 5

    },

    health_check_interval=30

)

client = redis.Redis(connection_pool=pool)

Use Cloud SQL Proxy sidecar

# Use Cloud SQL connector which handles reconnection

# requirements.txt

cloud-sql-python-connector[pg8000]
from google.cloud.sql.connector import Connector

import sqlalchemy

connector = Connector()

def getconn():

    return connector.connect(

        "project:region:instance",

        "pg8000",

        user="user",

        password="password",

        db="database"

    )

engine = sqlalchemy.create_engine(

    "postgresql+pg8000://",

    creator=getconn

)

Container Startup Timeout (4 minutes max)

Severity: HIGH

Situation: Deploying containers with slow initialization

Symptoms:

Deployment fails with "Container failed to start".

Service never becomes healthy.

"Revision failed to become ready" errors.

Works locally but fails on Cloud Run.

Why this breaks:

Cloud Run expects your container to start listening on PORT within

4 minutes (240 seconds). If it doesn't, the instance is killed.

Common causes:

  • Heavy framework initialization (ML models, etc.)
  • Waiting for external dependencies at startup
  • Large dependency loading
  • Database migrations on startup

Recommended fix:

Enable startup CPU boost

gcloud run deploy my-service \

  --cpu-boost \

  --startup-cpu-boost

Lazy initialization

from functools import lru_cache

from fastapi import FastAPI

app = FastAPI()

# Don't load at import time

model = None

@lru_cache()

def get_model():

    global model

    if model is None:

        # Load on first request, not at startup

        model = load_heavy_model()

    return model

@app.get("/predict")

async def predict(data: dict):

    model = get_model()  # Loads on first call only

    return model.predict(data)

# Startup is fast - model loads on first request

Start listening immediately

import asyncio

from fastapi import FastAPI

import uvicorn

app = FastAPI()

# Global state for async initialization

initialized = asyncio.Event()

@app.on_event("startup")

async def startup():

    # Start background initialization

    asyncio.create_task(async_init())

async def async_init():

    # Heavy initialization happens after server starts

    await load_models()

    await warm_up_connections()

    initialized.set()

@app.get("/ready")

async def ready():

    if not initialized.is_set():

        raise HTTPException(503, "Still initializing")

    return {"status": "ready"}

@app.get("/health")

async def health():

    # Always respond - health check passes

    return {"status": "healthy"}

Use multi-stage builds

# Build stage - slow

FROM python:3.11 as builder

WORKDIR /app

COPY requirements.txt .

RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt

# Runtime stage - fast startup

FROM python:3.11-slim

WORKDIR /app

COPY --from=builder /wheels /wheels

RUN pip install --no-cache /wheels/* && rm -rf /wheels

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]

Run migrations separately

# Don't migrate on startup - use Cloud Build

steps:

  # Run migrations first

  - name: 'gcr.io/cloud-builders/gcloud'

    entrypoint: 'bash'

    args:

      - '-c'

      - |

        gcloud run jobs execute migrate-job --wait

  # Then deploy

  - name: 'gcr.io/cloud-builders/gcloud'

    args: ['run', 'deploy', 'my-service', ...]

Second Generation Execution Environment Differences

Severity: MEDIUM

Situation: Migrating to or using Cloud Run second-gen execution environment

Symptoms:

Network behavior changes.

Different syscall support.

File system behavior differences.

Container behaves differently than in first-gen.

Why this breaks:

Cloud Run's second-generation execution environment uses a different

sandbox (gVisor) with different characteristics:

  • More Linux syscalls supported
  • Full /proc and /sys access
  • Different network stack
  • No automatic HTTPS redirect
  • Different tmp filesystem behavior

Recommended fix:

Explicitly set execution environment

# First generation (legacy)

gcloud run deploy my-service \

  --execution-environment=gen1

# Second generation (recommended for most)

gcloud run deploy my-service \

  --execution-environment=gen2

Handle network differences

# Second-gen doesn't auto-redirect HTTP to HTTPS

from fastapi import FastAPI, Request

from fastapi.responses import RedirectResponse

app = FastAPI()

@app.middleware("http")

async def redirect_https(request: Request, call_next):

    # Check X-Forwarded-Proto header

    if request.headers.get("X-Forwarded-Proto") == "http":

        url = request.url.replace(scheme="https")

        return RedirectResponse(url, status_code=301)

    return await call_next(request)

GPU access (second-gen only)

# GPUs only available in second-gen

gcloud run deploy ml-service \

  --execution-environment=gen2 \

  --gpu=1 \

  --gpu-type=nvidia-l4

Check execution environment

import os

def get_execution_environment():

    # Second-gen has different /proc structure

    try:

        with open('/proc/version', 'r') as f:

            version = f.read()

            if 'gVisor' in version:

                return 'gen2'

    except:

        pass

    return 'gen1'

Request Timeout Configuration Mismatch

Severity: MEDIUM

Situation: Long-running requests or background processing

Symptoms:

Requests terminated before completion.

504 Gateway Timeout errors.

Processing stops unexpectedly.

Inconsistent timeout behavior.

Why this breaks:

Cloud Run has multiple timeout configurations that must align:

  • Request timeout (default 300s, max 3600s for HTTP, 60m for gRPC)
  • Client timeout
  • Downstream service timeouts
  • Load balancer timeout (for external access)

Recommended fix:

Set consistent timeouts

# Increase request timeout (max 3600s for HTTP)

gcloud run deploy my-service \

  --timeout=900  # 15 minutes

Handle long-running with webhooks

from fastapi import FastAPI, BackgroundTasks

import httpx

app = FastAPI()

@app.post("/process")

async def process(data: dict, background_tasks: BackgroundTasks):

    task_id = create_task_id()

    # Start background processing

    background_tasks.add_task(

        long_running_process,

        task_id,

        data,

        data.get("callback_url")

    )

    # Return immediately

    return {"task_id": task_id, "status": "processing"}

async def long_running_process(task_id, data, callback_url):

    result = await heavy_computation(data)

    # Callback when done

    if callback_url:

        async with httpx.AsyncClient() as client:

            await client.post(callback_url, json={

                "task_id": task_id,

                "result": result

            })

Use Cloud Tasks for reliable long-running

from google.cloud import tasks_v2

def create_long_running_task(data):

    client = tasks_v2.CloudTasksClient()

    parent = client.queue_path(PROJECT, REGION, "long-tasks")

    task = {

        "http_request": {

            "http_method": tasks_v2.HttpMethod.POST,

            "url": "https://worker.run.app/process",

            "body": json.dumps(data).encode(),

            "headers": {"Content-Type": "application/json"}

        },

        "dispatch_deadline": {"seconds": 1800}  # 30 min

    }

    return client.create_task(parent=parent, task=task)

Streaming for long responses

from fastapi import FastAPI

from fastapi.responses import StreamingResponse

@app.get("/large-report")

async def large_report():

    async def generate():

        for chunk in process_large_data():

            yield chunk

    return StreamingResponse(generate(), media_type="text/plain")

Validation Checks

Hardcoded GCP Credentials

Severity: ERROR

GCP credentials must never be hardcoded in source code

Message: Hardcoded GCP service account credentials. Use Secret Manager or Workload Identity.

GCP API Key in Source Code

Severity: ERROR

API keys should use Secret Manager

Message: Hardcoded GCP API key. Use Secret Manager.

Credentials JSON File in Repository

Severity: ERROR

Service account JSON files should not be in source control

Message: Credentials file detected. Add to .gitignore and use Secret Manager.

Running as Root User

Severity: WARNING

Containers should not run as root for security

Message: Dockerfile runs as root. Add USER directive for security.

Missing Health Check in Dockerfile

Severity: INFO

Cloud Run uses HTTP health checks, Dockerfile HEALTHCHECK is optional

Message: No HEALTHCHECK in Dockerfile. Cloud Run uses its own health checks.

Hardcoded Port in Application

Severity: WARNING

Port should come from PORT environment variable

Message: Hardcoded port. Use PORT environment variable for Cloud Run.

Large File Writes to /tmp

Severity: WARNING

/tmp uses container memory, large writes can cause OOM

Message: /tmp writes consume memory. Consider Cloud Storage for large files.

Synchronous File Operations

Severity: WARNING

Sync file ops block the event loop in async apps

Message: Synchronous file operations. Use async versions for better concurrency.

Global Mutable State

Severity: WARNING

Global state issues with concurrent requests

Message: Global mutable state may cause issues with concurrent requests.

Thread-Unsafe Singleton Pattern

Severity: WARNING

Singletons need thread safety for concurrency > 1

Message: Singleton pattern - ensure thread safety if using concurrency > 1.

Collaboration

Delegation Triggers

  • user needs AWS serverless -> aws-serverless (Lambda, API Gateway, SAM)
  • user needs Azure containers -> azure-functions (Azure Container Apps, Functions)
  • user needs database design -> postgres-wizard (Cloud SQL design, AlloyDB)
  • user needs authentication -> auth-specialist (Firebase Auth, Identity Platform)
  • user needs AI integration -> llm-architect (Vertex AI, Cloud Run + LLM)
  • user needs workflow orchestration -> workflow-automation (Cloud Workflows, Eventarc)

When to Use

Use this skill when the request clearly matches the capabilities and patterns described above.

Limitations

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card