Python Monorepo Architecture for Scalable FastAPI Microservices
Python Monorepo Architecture for Scalable FastAPI Microservices
A Python monorepo architecture is a development strategy where multiple services and shared libraries are housed in a single repository to ensure dependency synchronization. This approach uses modern tools like uv to manage a unified lockfile, which prevents version drift and allows for atomic commits across microservices. Implementing this architecture streamlines CI/CD pipelines and improves developer velocity by centralizing shared Pydantic models and utility code.
Three months ago, I hit a wall that every developer dreads. I was managing four separate GitHub repositories for my AI-powered automation platform: a FastAPI gateway, a background worker service, a suite of Gemini-powered agents, and a "common" utility library. I pushed what I thought was a trivial update to the common library—a change in how we structured the JSON payload for our internal LLM routing logic. I updated the gateway, ran the tests, and deployed. Everything looked green.
Two hours later, my PagerDuty went off. The background workers, which I hadn't updated because I thought the change was backward compatible, were failing silently. They were consuming messages from Pub/Sub but couldn't parse the new payload format. Because the worker repo was pinned to an older version of the common library, the CI/CD pipeline in that repo didn't catch the breaking change. By the time I realized what happened, 14,000 tasks had failed, and I had wasted nearly $450 in redundant LLM tokens trying to re-process corrupted state. This wasn't just a deployment failure; it was a fundamental failure of my multi-repo architecture.
I realized that as my team grew and our AI agents became more complex, the overhead of managing dependencies across repos was killing our velocity. I spent more time syncing versions and waiting for CI pipelines to run sequentially than I did writing code. I needed a single source of truth. I needed a Python monorepo that didn't feel like a hack. This is the story of how I rebuilt my entire stack into a high-performance monorepo using uv and a component-based architecture.
Why Multi-Repo Architecture Fails for Complex AI Stacks
Multi-repo setups often lead to silent failures when shared libraries are updated without immediate integration testing across all dependent services. In a microservices environment, the promise is independent scalability and deployment. But in reality, my services were tightly coupled by the data schemas they shared. When I was building self-correcting AI agents with Gemini, I found myself copy-pasting Pydantic models between the agent service and the monitoring service. Every time I improved the prompt logic or the error-handling schema, I had to open three different Pull Requests.
The "Common Library" approach was also a disaster. To test a change in a service, I had to:
- Commit the change to the
commonrepo. - Tag a new version (e.g.,
v1.2.4). - Wait for the private PyPI registry to update.
- Update the
requirements.txtin the service repo. - Push the service repo and hope the integration tests passed.
Why UV and Workspaces Are Best for Python Monorepo Architecture
The uv package manager provides native workspace support that resolves dependencies across the entire project tree while maintaining a single lockfile. In early 2026, the Python packaging landscape finally matured. I looked at Poetry and PDM, but they struggled with large monorepos containing dozens of services. I ultimately settled on uv. If you aren't using it yet, uv is a Rust-based Python package manager that is orders of magnitude faster than pip. More importantly, its native support for "workspaces" is what makes a Python monorepo architecture viable.
The core problem with Python monorepos historically has been the "Diamond Dependency" problem. Service A needs pandas 2.0, and Service B needs pandas 1.5. Most tools try to force a single virtual environment on you, which is a recipe for disaster. uv allows you to define a workspace where every service has its own pyproject.toml, but they all share a single lockfile. This ensures that while services can have different dependencies, the versions of shared libraries are always compatible.
Here is how I structured my root pyproject.toml:
[project]
name = "techfrontier-monorepo"
version = "0.1.0"
description = "Unified AI Services"
dependencies = []
[tool.uv.workspace]
members = [
"services/*",
"libs/*",
"tools/*",
]
[tool.uv.sources]
common-logic = { workspace = true }
ai-schemas = { workspace = true }
This configuration tells uv that anything inside services/ or libs/ is part of the same project. When I run uv lock, it resolves the entire tree. If I update a shared library in libs/common-logic, uv immediately flags if that update breaks the dependency constraints of the gateway service.
How to Organize a Python Monorepo Directory for Services and Libraries
A successful monorepo distinguishes between deployable services and shared libraries to maintain a clean dependency graph. One mistake I made early on was treating every folder like a deployable service. I learned that you need a clear distinction between Services (deployable units like FastAPI apps or workers) and Libraries (shared code used by services). I adopted a structure inspired by the Polylith architecture, but adapted for Python's ecosystem.
.
├── pyproject.toml
├── uv.lock
├── libs/
│ ├── ai-schemas/ # Shared Pydantic models
│ │ ├── pyproject.toml
│ │ └── ai_schemas/
│ └── common-utils/ # Logging, DB connections
│ ├── pyproject.toml
│ └── common_utils/
├── services/
│ ├── api-gateway/ # FastAPI entry point
│ │ ├── pyproject.toml
│ │ ├── Dockerfile
│ │ └── src/
│ └── agent-worker/ # Background processing
│ ├── pyproject.toml
│ ├── Dockerfile
│ └── src/
└── tools/
└── ci-scripts/
This separation is crucial for my dynamic LLM model routing logic. The routing logic itself lives in libs/common-utils. Both the api-gateway (which handles real-time requests) and the agent-worker (which handles batch jobs) import this library. When I optimize the routing cost weights, I change it in one place. Because of the monorepo structure, my IDE (VS Code with Pyright) immediately shows me if the new routing parameters break the function calls in the worker service.
How to Optimize Docker Build Times in a Python Monorepo
Docker build performance in monorepos depends on using multi-stage builds and caching the dependency resolution layer separately from the source code. The biggest technical hurdle I faced was Docker build times. In a multi-repo setup, the Docker context is small. In a monorepo, if you aren't careful, the Docker daemon sends the entire 500MB repository to the build engine every time you change one line of code. This made my CI/CD pipelines jump from 5 minutes to 20 minutes.
To solve this, I used Docker's --build-context and uv's ability to install from a specific path. But the real trick was multi-stage builds that selectively copy only the necessary workspace members. Here is the optimized Dockerfile I wrote for the api-gateway:
# Stage 1: Resolver
FROM python:3.12-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/bin/
WORKDIR /app
# Copy only the configuration files first to cache layers
COPY pyproject.toml uv.lock ./
COPY libs/common-utils/pyproject.toml libs/common-utils/
COPY libs/ai-schemas/pyproject.toml libs/ai-schemas/
COPY services/api-gateway/pyproject.toml services/api-gateway/
# Install dependencies without the actual source code
RUN uv sync --frozen --no-install-project --no-dev
# Stage 2: Final Image
FROM python:3.12-slim
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
# Now copy only the code needed for this service
COPY libs/common-utils ./libs/common-utils
COPY libs/ai-schemas ./libs/ai-schemas
COPY services/api-gateway ./services/api-gateway
WORKDIR /app/services/api-gateway
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8080"]
By copying the pyproject.toml files first and running uv sync --no-install-project, I created a cached layer that only invalidates if my dependencies change. If I only change the code in src.main:app, the build takes 15 seconds. This is a massive win for developer productivity. You can read more about the technical specifics of workspace synchronization in the official uv workspace documentation.
How to Reduce CI/CD Costs Using Selective Change Detection
Selective CI/CD pipelines use git diff logic to only run tests and builds for the specific components affected by a code change. When you have 10 services in one repo, you don't want to run the tests for all 10 services when you only modified one. My GitHub Actions bill started to spike because I was running 40 minutes of tests on every PR. I had to implement "Change Detection."
I wrote a custom script using git diff to identify which directories changed. If a change occurs in libs/common-utils, I trigger tests for everything that depends on it. If a change only occurs in services/api-gateway, I only run that service's suite. Here is the logic I used in my GitHub Actions YAML:
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
services: ${{ steps.filter.outputs.changes }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
gateway:
- 'services/api-gateway/**'
- 'libs/**'
worker:
- 'services/agent-worker/**'
- 'libs/**'
test-gateway:
needs: detect-changes
if: ${{ needs.detect-changes.outputs.gateway == 'true' }}
runs-on: ubuntu-latest
steps:
- checkout...
- run: uv run pytest services/api-gateway
This configuration reduced our CI minutes by 65%. It also meant that developers got feedback on their specific changes much faster, which reduced the context-switching tax that kills engineering teams.
Managing Performance Regressions and Environment Variables in Monorepos
Local development in a large monorepo requires specific IDE configurations to prevent language server lag and environment variable conflicts. It wasn't all smooth sailing. One major issue I encountered was the local development environment. When you have a monorepo, your IDE's language server (like Pyright or Pylance) has to index the entire tree. Initially, this made my VS Code lag significantly. I had to configure pyrightconfig.json to ignore certain folders and focus only on the active workspace.
Another "gotcha" was environment variables. In separate repos, .env files are straightforward. In a monorepo, you need a strategy to prevent services/api-gateway/.env from conflicting with services/agent-worker/.env. I eventually standardized on using direnv to load environment variables automatically as I move between directories in my terminal.
I also learned that dependency versioning must be strict. In a monorepo, it is tempting to just use the latest version of everything. But when I tried to upgrade pydantic from v2.6 to v2.8, I realized that while the gateway was fine, one of my legacy AI agents was using a deprecated validation decorator. The monorepo didn't prevent the breakage, but it made the breakage visible immediately during the uv lock phase, rather than at runtime in production.
Key Takeaways for Implementing a Python Monorepo Architecture
Successful Python monorepo architecture relies on modern tooling, strict dependency versioning, and automated change detection to maintain developer velocity.
- Tooling is everything: Don't try to build a Python monorepo with standard
pipandrequirements.txt. Use a workspace-aware tool likeuvorpants. The speed ofuv(written in Rust) is the primary reason this architecture is now viable for Python. - Shared Lockfiles are a double-edged sword: They prevent version drift between services, but they also force you to resolve dependencies for the entire project at once. This can lead to "dependency hell" if one service requires an ancient version of a library.
- Optimize Docker Contexts: Use multi-stage builds and only copy the
pyproject.tomlfiles needed for the specific service's dependency tree. This keeps build times low. - Automate Change Detection: Use tools like
paths-filterin your CI/CD to avoid running unnecessary tests and builds. This is the difference between a 5-minute CI and a 30-minute CI. - Visibility is the biggest benefit: The ability to "Go to Definition" in your IDE and jump from a service's API call directly into the shared library's source code is a massive productivity boost.
Related Reading
-
Building Self-Correcting AI Agents with Gemini and Python - This post details the logic inside the
agent-workerservice mentioned in this monorepo deep-dive. - Dynamic LLM Model Routing for API Cost Optimization - Learn about the shared library logic that I centralized within my monorepo to save on API costs.
Moving to a Python monorepo architecture was a grueling two-week migration, but it has completely changed how I build software. I no longer fear breaking changes in shared utilities because the entire system is validated as a single unit. My next goal is to integrate these monorepo builds with Google Cloud Run's multi-region deployments to reduce latency for our global AI agents. I'll be documenting that process—and the inevitable networking headaches—in my next post.
Comments
Post a Comment