Zero-Downtime Deployments on AWS: A Practical Playbook

Every engineering team eventually arrives at the same question: how do we ship on a Friday afternoon without someone holding their breath? The answer isn't cultural — it's architectural. Zero-downtime deployments are an engineering problem with well-understood solutions.

The three patterns

Rolling updates replace instances one at a time, maintaining capacity throughout. Works well for stateless services but can leave you with mixed versions handling requests simultaneously — a problem if your API introduces breaking changes.

Blue-green deployments maintain two identical environments. Traffic switches atomically from blue (current) to green (new). Rollback is instant — flip the load balancer back. The cost: you're running double the infrastructure at every deployment.

Canary releases send a small percentage of traffic to the new version first, watching error rates and latency before gradually shifting more. The safest pattern for high-traffic services, but the most complex to instrument.

When we use each

For most client applications, we default to blue-green on ECS or Kubernetes, triggered by a GitHub Actions workflow. The pipeline looks like this:

Build and push Docker image to ECR
Run integration tests against the new image
Register new ECS task definition
Update the ECS service with deployment-circuit-breaker enabled
CodeDeploy manages the traffic shift with health check gates
Automatic rollback if error rate exceeds 1% in the first 5 minutes

For services where we can't afford any version mixing — like payment processors or state machines — we use blue-green with a pre-flight database migration step that runs before traffic shifts.

The database problem

The hardest part of zero-downtime deployments isn't the application layer — it's the database. You can't atomically deploy both a new application version and a breaking schema change. The solution is the expand-contract pattern: add the new column first (and keep both old and new code compatible), deploy the new application, then run the cleanup migration. It's more work, but it's the only way to avoid a maintenance window.

The three patterns

When we use each

For most client applications, we default to blue-green on ECS or Kubernetes, triggered by a GitHub Actions workflow. The pipeline looks like this:

Build and push Docker image to ECR

Run integration tests against the new image

Update the ECS service with deployment-circuit-breaker enabled

CodeDeploy manages the traffic shift with health check gates

Automatic rollback if error rate exceeds 1% in the first 5 minutes

For services where we can't afford any version mixing — like payment processors or state machines — we use blue-green with a pre-flight database migration step that runs before traffic shifts.

The database problem

Zero-Downtime Deployments on AWS: A Practical Playbook

The three patterns

When we use each

The database problem

The Practical Case for JSDoc Over TypeScript in 2025

What We Learned Shipping RAG to Production at Scale

Designing for Dark Mode from Day One

Stay ahead of the build

Zero-Downtime Deployments on AWS: A Practical Playbook

The three patterns

When we use each

The database problem

The Practical Case for JSDoc Over TypeScript in 2025

What We Learned Shipping RAG to Production at Scale

Designing for Dark Mode from Day One

Stay ahead of the build