FastAPI Structured Logging on Cloud Run: A Complete Guide

FastAPI Structured Logging on Cloud Run: A Complete Guide

FastAPI structured logging on Cloud Run is implemented by creating a custom JSON formatter that maps Python log levels to Google Cloud severity keys and extracts the X-Cloud-Trace-Context header. This approach prevents log fragmentation and enables automatic request correlation within the Google Cloud Logs Explorer, reducing debugging time significantly.

It was 2:14 AM last Tuesday when my pager went off. One of my AI agents, running on a FastAPI backend in Google Cloud Run, had started throwing 500 errors during a heavy spike in traffic. I jumped into the Google Cloud Logs Explorer, expecting to find a clear stack trace. Instead, I found a fragmented nightmare. Because standard Python logging outputs text, Cloud Run’s logging agent treated every single line of my tracebacks as a separate log entry. I had 40 different "Error" entries for a single exception, interleaved with "Info" logs from other concurrent requests. It was impossible to tell which error belonged to which user or which specific Gemini API call had timed out.

I spent forty-five minutes manually stitching together log timestamps just to realize I had a simple connection pool exhaustion. That was the breaking point. I realized that if I’m going to build production-grade AI systems, I can’t rely on the default logging.basicConfig. I needed FastAPI structured logging that Google Cloud Logging actually understands. I needed my logs to be JSON objects where the severity, the message, the trace ID, and the custom metadata (like token usage or model IDs) were all first-class citizens.

In this post, I’m going to walk you through the exact implementation I moved to. We’ll cover how to hijack the Uvicorn and FastAPI loggers, how to format them into GCP-compatible JSON, and how to inject the X-Cloud-Trace-Context header so your logs are automatically grouped by request in the GCP console. This isn't just about making things pretty; it's about reducing your Mean Time to Recovery (MTTR) from hours to seconds.

Why Default Python Logging Causes Log Fragmentation on Cloud Run

Standard Python logging fails on Cloud Run because multi-line tracebacks are treated as separate log entries, leading to severe log fragmentation in the console. When you run logging.info("Hello World") in a local terminal, it looks fine. But Cloud Run captures stdout and stderr. By default, Google Cloud’s logging agent expects a single line to represent a single log entry. If your log message contains a newline character—which every Python traceback does—Cloud Run splits it.

Furthermore, standard logs lack "Severity" awareness. If you print to stderr, GCP marks it as an ERROR. If you print to stdout, it's INFO. There is no nuance for DEBUG, WARNING, or CRITICAL unless you use a specific JSON structure that GCP recognizes. I previously ran into similar observability hurdles when I was reducing Go Cloud Run cold starts, where I realized that the way we initialize our logging and telemetry can significantly impact the initial latency of the container.

The third issue is request correlation. In a high-concurrency environment like FastAPI, logs from different requests are woven together. Without a trace ID attached to every log line, you can't filter the Logs Explorer to see exactly what happened during one specific execution of an endpoint.

How to Create a GCP-Compatible JSON Formatter for FastAPI

A custom JSON formatter is required to transform Python LogRecords into the specific JSON structure that Google Cloud Logging expects for severity and metadata. The first thing I did was ditch the standard logging.Formatter. I needed a class that would take a LogRecord and turn it into a dictionary, then a JSON string. Google Cloud Logging looks for specific keys: severity, message, and logging.googleapis.com/trace.

import json
import logging
import traceback
from datetime import datetime

class CloudRunJSONFormatter(logging.Formatter):
    """
    Custom formatter to output JSON logs compatible with Google Cloud Logging.
    """
    def format(self, record: logging.LogRecord) -> str:
        # Map Python levels to GCP severity levels
        level_map = {
            "DEBUG": "DEBUG",
            "INFO": "INFO",
            "WARNING": "WARNING",
            "ERROR": "ERROR",
            "CRITICAL": "CRITICAL",
        }

        log_data = {
            "severity": level_map.get(record.levelname, "INFO"),
            "message": record.getMessage(),
            "timestamp": datetime.utcfromtimestamp(record.created).isoformat() + "Z",
            "logging.googleapis.com/sourceLocation": {
                "file": record.pathname,
                "line": record.lineno,
                "function": record.funcName,
            },
            "logger": record.name,
        }

        # Handle exceptions to prevent multi-line fragmentation
        if record.exc_info:
            log_data["exception"] = self.formatException(record.exc_info)
            # Combine message and traceback for easier reading in some UI views
            log_data["message"] = f"{log_data['message']}\n{log_data['exception']}"

        # Inject custom attributes (like trace_id) if they exist
        if hasattr(record, "trace_id"):
            log_data["logging.googleapis.com/trace"] = record.trace_id

        # Merge extra fields passed via logging.info("msg", extra={"key": "val"})
        # We exclude standard LogRecord attributes
        standard_attrs = {
            "args", "asctime", "created", "exc_info", "exc_text", "filename",
            "funcName", "levelname", "levelno", "lineno", "module",
            "msecs", "msg", "name", "pathname", "process", "processName",
            "relativeCreated", "stack_info", "thread", "threadName", "trace_id"
        }
        
        for key, value in record.__dict__.items():
            if key not in standard_attrs:
                log_data[key] = value

        return json.dumps(log_data)

This formatter solves the fragmentation problem because the entire traceback is now a string inside a single JSON object. When Cloud Run sees a single line of JSON, it parses it as a single log entry. I also included a loop to catch any extra={} fields. This is crucial for my cost-aware LLM proxy, where I log the token count and model name for every request to track spending in real-time.

How to Capture and Inject the Cloud Run Trace ID for Log Correlation

Correlating logs requires extracting the X-Cloud-Trace-Context header and mapping it to the logging.googleapis.com/trace field in your JSON payload. Google Cloud Run automatically injects a header called X-Cloud-Trace-Context into every incoming request. The format is usually TRACE_ID/SPAN_ID;o=TRACE_TRUE. To get logs to group together in the GCP console, you must extract that TRACE_ID and include it in your log entry under the key logging.googleapis.com/trace.

The trace ID must be formatted as projects/YOUR_PROJECT_ID/traces/TRACE_ID. I wrote a simple middleware for FastAPI to handle this. Since Python's logging module isn't natively async-context-aware, I used contextvars to store the trace ID for the duration of the request.

import contextvars
from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware
import os

# Context variable to store the trace ID for the current task
trace_id_var = contextvars.ContextVar("trace_id", default=None)
PROJECT_ID = os.getenv("GOOGLE_CLOUD_PROJECT", "my-project-id")

class TraceMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        trace_header = request.headers.get("X-Cloud-Trace-Context")
        if trace_header:
            # Extract the trace ID part before the slash
            trace_id = trace_header.split("/")[0]
            full_trace_path = f"projects/{PROJECT_ID}/traces/{trace_id}"
            token = trace_id_var.set(full_trace_path)
        else:
            token = None
        
        try:
            response = await call_next(request)
            return response
        finally:
            if token:
                trace_id_var.reset(token)

Now, we need a logging.Filter to automatically pull this value from the contextvars and attach it to every LogRecord. This is much cleaner than manually passing the trace ID to every logger.info() call.

class TraceFilter(logging.Filter):
    def filter(self, record):
        record.trace_id = trace_id_var.get()
        return True

How to Configure Uvicorn and FastAPI Loggers for Unified JSON Output

Unified logging requires explicitly overriding the default handlers for both Uvicorn and FastAPI to prevent plain-text logs from leaking into your structured stream. This is where most developers get stuck. FastAPI uses Uvicorn as the ASGI server, and Uvicorn has its own logging configuration that often overrides or ignores your root logger settings. If you don't explicitly configure Uvicorn's loggers (uvicorn, uvicorn.access, and uvicorn.error), you'll end up with a mix of beautiful JSON logs and ugly, plain-text access logs.

I found the most reliable way to handle this is to apply the configuration manually during the FastAPI startup sequence. I avoid using logging.config.dictConfig because it can be finicky with existing handlers. Instead, I iterate through the loggers I care about.

def setup_logging():
    log_handler = logging.StreamHandler()
    log_handler.setFormatter(CloudRunJSONFormatter())
    log_handler.addFilter(TraceFilter())

    # The loggers we want to hijack
    loggers = [
        logging.getLogger(None),  # Root logger
        logging.getLogger("uvicorn"),
        logging.getLogger("uvicorn.access"),
        logging.getLogger("uvicorn.error"),
        logging.getLogger("fastapi"),
    ]

    for logger in loggers:
        # Clear existing handlers to avoid duplicate logs
        logger.handlers = [log_handler]
        logger.propagate = False
        logger.setLevel(logging.INFO)

# Call this in your main.py before initializing FastAPI()
setup_logging()
app = FastAPI()
app.add_middleware(TraceMiddleware)

By setting logger.propagate = False, I ensure that logs don't bubble up to the root logger and get printed twice—a common issue that doubles your logging costs. You can find more details on how Google handles these JSON payloads in the official Google Cloud Structured Logging documentation.

Performance Benchmarks: Does Structured JSON Logging Impact FastAPI Latency?

Benchmarking shows that implementing FastAPI structured logging introduces a negligible 3% performance overhead, which is heavily outweighed by the observability benefits. I was worried that serializing every log to JSON would add significant overhead, especially for high-throughput endpoints. I ran a small benchmark on a Cloud Run instance (1 vCPU, 2GB RAM) using wrk to see the impact.

  • Standard Text Logging: 1,250 requests/sec, Avg Latency 12ms.
  • Structured JSON Logging: 1,215 requests/sec, Avg Latency 13.5ms.

The performance hit is roughly 3%. In the context of an AI application where a single Gemini API call takes 500ms to 2000ms, a 1.5ms increase in logging overhead is statistically irrelevant. The trade-off is absolutely worth it for the observability gains. However, I did notice that if you log massive objects (like full LLM prompt strings), the serialization time can spike. I now truncate any log field longer than 10,000 characters to prevent this.

Another thing to watch is your logging volume. Structured logs are slightly larger in bytes than plain text logs. If you are logging every single request with full metadata, your GCP logging bill will increase. I mitigated this by implementing a custom sampling filter for successful 200 OK access logs, while always keeping 100% of error logs. I actually wrote about a similar strategy for managing costs in my post on reducing LLM API costs with custom tokenization, where I applied the same "only store what you need" philosophy.

How to Manage Local Development vs Production Logging Environments

Using environment variables allows you to toggle between human-readable text logs for local development and structured JSON for production environments. One downside of JSON logging is that it’s unreadable in a local terminal. It’s just a wall of text. To fix this, I check an environment variable to determine which formatter to use. If I'm running locally (ENV=dev), I use a standard colorful formatter; in production (ENV=prod), I switch to the CloudRunJSONFormatter.

def setup_logging():
    env = os.getenv("ENV", "dev")
    log_handler = logging.StreamHandler()
    
    if env == "prod":
        log_handler.setFormatter(CloudRunJSONFormatter())
    else:
        # Use a human-readable format for local dev
        formatter = logging.Formatter("%(levelname)s: %(message)s")
        log_handler.setFormatter(formatter)
    
    # ... rest of the setup

Key Takeaways for Implementing FastAPI Structured Logging

Effective observability in distributed systems depends on trace correlation, consistent log formatting, and efficient context propagation across asynchronous tasks.

  • Trace Correlation is Non-Negotiable: Without X-Cloud-Trace-Context integration, you are flying blind in a distributed system. Being able to click "View Logs for this Trace" in GCP is a superpower.
  • Uvicorn is Stubborn: You can't just configure the root logger. You must explicitly target uvicorn.access and uvicorn.error to ensure consistency.
  • JSON Structure Matters: Use the specific keys GCP expects (severity, message, logging.googleapis.com/trace). This enables built-in platform features like error reporting and log-based metrics.
  • ContextVars for Async: Standard thread-local storage doesn't work in FastAPI because of the async event loop. Use contextvars to propagate trace IDs safely across await points.
  • Watch Your Serialization: Be careful with what you put in the extra dict. Large objects can slow down your event loop and balloon your storage costs.

Related Reading

Implementing this system transformed my debugging experience. The next time my pager went off, I was able to filter by the trace ID of the failing request and see the exact sequence of events: the incoming request, the specific database query that hung, and the resulting traceback—all grouped neatly in one view. My next goal is to integrate these structured logs with OpenTelemetry to get even deeper spans into my AI agent's internal reasoning steps. If you're still squinting at multi-line tracebacks in the GCP console, take the thirty minutes to implement a custom JSON formatter. Your future self at 2 AM will thank you.

Comments

Popular posts from this blog

Optimizing LLM API Latency: Async, Streaming, and Pydantic in Production

How I Built a Semantic Cache to Reduce LLM API Costs

How I Squeezed LLM Inference onto a Raspberry Pi for Local AI