Adding Observability to a Containerized App on Google Cloud Run

#cloud #container #observability

Observability is the ability to understand what’s happening inside a system by looking at the data it produces, so you can detect issues, debug faster, and improve performance.

Let’s look at this simple scenario: A user opens your app and clicks “Pay” but behind the scenes:

  • The app receives the request
  • It talks to a payment service
  • It saves data in a database
  • It sends a response back to the user

When something goes wrong, the system says “My payment failed.”

Observability is how you figure out why.

Pillars of Observability

To achieve observability, systems generally rely on three specific types of data:

1. Logs

Logs are timestamped records of events. They are written messages from your system that say what it did at a specific moment. They answer questions like “What happened?” and “What went wrong?” They provide more details an context to why something went wrong. For example, an incoming HTTP request, an error when calling a downstream API, a warning about a slow database query.

2. Metrics

Metrics are numbers that show how your system is behaving over time. They show data like request count, error rate, and request latency. They answer questions like: Is the service healthy? Is performance getting worse?

3. Traces

Traces follow a single request as it moves through different parts of the system, showing each step and how long it took. A trace is made up of spans, where each span represents a unit of work (for example, an HTTP request or a database call). Traces answer questions like: Why is this request slow? Which service caused the failure?

In short, logs explain events, metrics show patterns and impact, traces show the flow and bottlenecks. Together, they make a system understandable instead of a black box.

What you will learn

This guide is a hands-on, step-by-step walkthrough on how to easily add observability to your application.

In this guide, you will:

  • Add logs, metrics, and traces to a Python app
  • Deploy an observable service to Google Cloud Run
  • Verify observability data in Cloud logging, monitoring, and trace

How to add observability to a containerized app on Google Cloud Run

Prerequisites

Before starting, make sure you have:

  • A Google Cloud project
  • gcloud CLI installed and authenticated
  • Docker installed
  • Python 3.10 or later

Step 1: Create the Python App

Every observable system starts with a simple function. Before we add complex tracking, we need a working application. We’ll start with a basic FastAPI server that says “hello”.

  1. Create a new directory:
mkdir cloud-run-observability-python
cd cloud-run-observability-python
  1. Create this structure:
cloud-run-observability-python/
├── main.py
├── otel.py
├── requirements.txt
└── Dockerfile
  1. Create main.py with these contents:
from fastapi import FastAPI, Request
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")

app = FastAPI()

@app.get("/hello")
async def hello(request: Request):
    logger.info("hello endpoint called")
    return {"message": "Hello from Cloud Run"}

At this point:

  • The app has basic logging
  • No observability tooling yet

Step 2: Run the app locally

Before moving to the cloud, we need to ensure our “engine” actually starts. This step confirms that your environment is configured correctly and the basic logic of your app is sound.

  1. Install dependencies:
pip install fastapi uvicorn
  1. Run the app:
uvicorn main:app --host 0.0.0.0 --port 8080
  1. Test it:
curl http://localhost:8080/hello

You should see:

{"message": "Hello from Cloud Run"}

Step 3: Add OpenTelemetry sependencies

To make an app observable, it needs a “voice.” OpenTelemetry is an industry-standard toolkit that allows your app to describe its own behavior. By installing these packages, we are giving our app the vocabulary it needs to talk to Google Cloud.

  1. Stop the app and install OpenTelemetry packages:
pip install \
  opentelemetry-sdk \
  opentelemetry-api \
  opentelemetry-exporter-otlp \
  opentelemetry-instrumentation \
  opentelemetry-instrumentation-fastapi \
  opentelemetry-instrumentation-logging
  1. Freeze dependencies:
pip freeze > requirements.txt

Step 4: Configure OpenTelemetry (otel.py)

Installing OpenTelemetry libraries isn’t enough on its own as they need to be configured before the application starts. This otel.py file sets up who we are (service name), what we collect (traces and metrics), and where that data goes. Think of it as the observability boot sequence for the app. This file initializes tracing and metrics before the app starts.

from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.logging import LoggingInstrumentor

resource = Resource.create({
    "service.name": "cloud-run-python-demo"
})

trace.set_tracer_provider(TracerProvider(resource=resource))
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter())
)

metrics.set_meter_provider(
    MeterProvider(
        resource=resource,
        metric_readers=[
            PeriodicExportingMetricReader(OTLPMetricExporter())
        ],
    )
)

LoggingInstrumentor().instrument()

# This function will be called from main.py
def instrument_app(app):
    FastAPIInstrumentor.instrument_app(app)

Step 5: Enable instrumentation in the app

With the configuration ready,you can now connect the observability setup to the actual application. By updating our main file, we’re telling OpenTelemetry to automatically track incoming requests, response times, and errors — without manually adding tracing code to every endpoint.

  1. Update main.py:
from fastapi import FastAPI, Request
import logging
from otel import instrument_app

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("app")

app = FastAPI()

instrument_app(app)

@app.get("/hello")
async def hello(request: Request):
    logger.info("hello endpoint called")
    return {"message": "Hello from Cloud Run"}

Now your app automatically produces:

  • HTTP traces
  • Request metrics
  • Correlated logs

Step 6: Containerize the app

To run on Google Cloud Run, our app needs to be self-contained. A Docker container ensures that the “voice” we just gave our app travels with it, whether it’s running on your laptop or a massive server in a Google data center.

  1. Create Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
  1. Build and test locally:
docker build -t observability-python .
docker run -p 8080:8080 observability-python

Step 7: Deploy to Cloud Run

It’s time to go live. We will push our container to Google Cloud Run, which provides a serverless environment that scales automatically. Once deployed, our app will start sending real-world signals to Google’s observability suite.

  1. Configure the service account In the Google Cloud Console, ensure the service account running your Cloud Run service (usually the Default Compute Service Account) has these three specific roles:
  • Logging Log Writer (roles/logging.logWriter)
  • Cloud Trace Agent (roles/cloudtrace.agent)
  • Monitoring Metric Writer (roles/monitoring.metricWriter)
  1. Now, run the deployment command. Google will build your container and host it on a secure URL.
gcloud run deploy cloud-run-python-demo \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

Wait for deployment, then open the service URL and hit /hello a few times.

Step 8: Validating the observability pipeline

Now that the service is live, we need to verify that our “signals” are actually arriving. Generate some traffic by hitting your service URL (e.g., https://[your-service-url]/hello) 5–10 times.

**Analyzing traces **

To see the lifecycle of your requests, navigate to Cloud Trace > Trace Explorer. You should see a list of URI requests for /hello. Click on a trace to see the Span. Because of our OpenTelemetry instrumentation, you’ll see exactly how long the request took to process within the FastAPI layer.

**Correlating logs ** Navigate to Logging > Logs Explorer. Instead of searching through all logs, use the following query to find your app’s specific output:

resource.type="cloud_run_revision"
resource.labels.service_name="cloud-run-python-demo"
textPayload:"hello endpoint called"

Because we used LoggingInstrumentor(), each log entry is automatically tagged with a trace_id. In the UI, you can click “View Trace” on a log entry to jump directly to the specific trace that produced that log.

**Monitoring metrics **

Navigate to Monitoring > Metrics Explorer to see the “big picture” of your app’s performance.

  1. Select a Metric: Search for run.googleapis.com/container/request_count.
  2. Filter: Add a filter for service_name = "cloud-run-python-demo".
  3. Group By: Select response_code_class to see a breakdown of 2xx vs. 4xx/5xx errors.

By following this journey, you’ve moved from a blind deployment to a fully transparent architecture.

SignalToolPurpose
LogsCloud LoggingDebugging specific code failures.
TracesCloud TraceFinding bottlenecks and latency issues.
MetricsCloud MonitoringAlerting and high-level health tracking.

With these signals in place, the service can be monitored, debugged, and reasoned about in production using standard Google Cloud observability tools. Observability doesn’t prevent failures, but it ensures that when they happen, you can see them, understand them, and act with confidence.