Multi-Region Replication
Streaming replication, logical replication, and the trade-offs of active-passive vs active-active across regions.
Real-World Analogy
Two branches of a bank: one is the main branch (primary), one is a backup branch (replica) that keeps its records synchronized with the main branch in near real-time. If the main branch burns down, the backup branch already has all the account data and can open for business immediately — no restore from tapes needed.
Streaming Replication (Physical)
PostgreSQL streaming replication sends WAL records from primary to standby in real time. The standby replays them continuously, staying within seconds of the primary.
How it works:
Primary: Write transaction → WAL record → Send to replica
Replica: Receive WAL → Replay → State matches primary (with lag) Setup on primary (postgresql.conf):
wal_level = replica
max_wal_senders = 5 # allow up to 5 standbys
wal_keep_size = 1GB # keep this much WAL for slow standbys
synchronous_standby_names = '' # async replication (see sync section below) Create replication user:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'reppassword'; pg_hba.conf on primary:
host replication replicator 10.0.2.0/24 scram-sha-256 Set up standby:
# On standby server: take base backup from primary
pg_basebackup \
-h primary.db.internal \
-U replicator \
-D /var/lib/postgresql/data \
-P -Xs -R # -R creates standby.signal + postgresql.auto.conf
# The -R flag creates:
# standby.signal → tells PostgreSQL to start as standby
# postgresql.auto.conf with:
# primary_conninfo = 'host=primary.db.internal user=replicator ...' Start standby:
systemctl start postgresql
# Check replication status on primary
psql -c "SELECT client_addr, state, sent_lsn, replay_lsn,
(sent_lsn - replay_lsn) AS replication_lag
FROM pg_stat_replication;" Replication Lag and What It Means for RPO
Async replication means the standby is always slightly behind:
-- On primary: check how far behind each standby is
SELECT
client_addr,
state,
pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS lag_size,
extract(epoch FROM (now() - reply_time)) AS lag_seconds
FROM pg_stat_replication; Under normal load, lag is milliseconds to a few seconds. Under heavy write load or network issues, lag can grow to minutes.
Your RPO = replication lag at the moment of disaster.
If lag is 30 seconds when the primary fails, you lose 30 seconds of data after promoting the replica. Configure monitoring and alert on lag above your RPO threshold:
# Alert if replication lag > 60 seconds
SELECT CASE
WHEN extract(epoch FROM (now() - reply_time)) > 60
THEN 'ALERT: Replication lag exceeds RPO'
ELSE 'OK'
END FROM pg_stat_replication; Synchronous Replication
For zero-data-loss (RPO = 0), configure synchronous replication. The primary waits for at least one standby to confirm it has received and written the WAL before acknowledging the commit.
# postgresql.conf on primary
synchronous_standby_names = 'FIRST 1 (standby1, standby2)'
# FIRST 1: wait for confirmation from 1 of the listed standbys
# (first one to respond) The trade-off:
Async replication:
Commit latency: +0ms (fire and forget)
RPO: seconds to minutes of potential data loss
Write throughput: unaffected by standby performance
Synchronous replication:
Commit latency: +latency to nearest standby (e.g., +2ms same AZ, +50ms cross-region)
RPO: zero (standby confirmed write before primary ack'd commit)
Write throughput: limited by standby's write speed + network RTT Cross-region sync replication is expensive in latency. A common pattern: sync replication to an in-region standby (fast, low latency), async replication to a cross-region DR standby (no latency impact, some lag).
# Sync to in-region standby, async to cross-region
synchronous_standby_names = 'FIRST 1 (standby-az2)'
# standby-dr-region gets async replication (not listed in synchronous_standby_names) Logical Replication
Physical replication copies WAL byte-for-byte — requires identical PostgreSQL versions and OS. Logical replication decodes WAL into logical changes (INSERT/UPDATE/DELETE) and replays them on the subscriber.
Use cases:
- Replicate to a different PostgreSQL major version (upgrade path)
- Replicate specific tables, not the full database
- Replicate to a different schema or transform data during replication
- Zero-downtime major version upgrades
-- On publisher (source)
ALTER SYSTEM SET wal_level = 'logical';
-- Restart PostgreSQL
-- Create publication
CREATE PUBLICATION orders_pub FOR TABLE orders, order_items;
-- On subscriber (destination)
CREATE SUBSCRIPTION orders_sub
CONNECTION 'host=primary.db.internal dbname=mydb user=replicator password=...'
PUBLICATION orders_pub;
-- Monitor replication
SELECT subname, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription; Logical replication limitations:
- DDL (schema changes) are not replicated — must apply manually on both sides
- Sequences not replicated — subscriber starts at its own position
- Large objects not replicated
- Requires primary keys or replica identity on all replicated tables
Promoting a Standby
When the primary fails, promote the standby to accept writes:
# Method 1: pg_ctl promote
pg_ctl promote -D /var/lib/postgresql/data
# Method 2: touch trigger file (if configured)
touch /tmp/postgresql.trigger
# Method 3: pg_promote() function (Postgres 12+, from inside psql)
SELECT pg_promote();
# Verify promotion
psql -c "SELECT pg_is_in_recovery();"
# f = primary (no longer in recovery) After promotion — update connection strings:
# Update application environment to point to new primary
aws ssm put-parameter \
--name /myapp/prod/DATABASE_URL \
--value "postgresql://app:pass@replica.db.internal:5432/mydb" \
--type SecureString \
--overwrite
# Restart or signal app servers to pick up new config
kubectl rollout restart deployment/api Patroni: Automated Failover
Manual failover is slow and error-prone. Patroni automates it using etcd, Consul, or ZooKeeper as a distributed consensus store.
# patroni.yml
scope: postgres-cluster
name: node1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.10:8008
etcd:
hosts: 10.0.0.10:2379,10.0.0.11:2379,10.0.0.12:2379
bootstrap:
dcs:
ttl: 30 # primary lease duration (seconds)
loop_wait: 10 # check interval
retry_timeout: 10
maximum_lag_on_failover: 1048576 # only failover if lag < 1MB
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.10:5432
data_dir: /var/lib/postgresql/data
parameters:
wal_level: replica
max_wal_senders: 5 Patroni monitors primary health and automatically promotes the most up-to-date replica when the primary fails, typically within 30–60 seconds.
Cross-Region Architecture
Region: us-east-1 (primary)
Primary DB (read/write)
↓ sync replication
Standby-AZ2 (hot standby, same region)
↓ async replication
↓
Region: us-west-2 (DR)
Read Replica (accepts read queries, ready for promotion)
WAL-G backups to S3 (cross-region backup) Read traffic to cross-region replica reduces query latency for west-coast users and keeps the DR replica warm (it’s already serving production traffic, so promotion is less disruptive).
Failover procedure for region failure:
# 1. Confirm primary region is unavailable
aws ec2 describe-instances --region us-east-1 --query 'Reservations[*].Instances[*].State'
# 2. Promote DR replica
ssh postgres@dr-db.us-west-2.internal
pg_ctl promote -D /var/lib/postgresql/data
# 3. Update global DNS (Route53 health check should do this automatically)
aws route53 change-resource-record-sets --hosted-zone-id Z123 --change-batch '{
"Changes": [{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "db.myapp.com",
"Type": "CNAME",
"TTL": 60,
"ResourceRecords": [{"Value": "dr-db.us-west-2.internal"}]
}
}]
}'
# 4. Update app config if not using DNS
aws ssm put-parameter --region us-west-2 \
--name /myapp/prod/DATABASE_URL \
--value "postgresql://app:pass@dr-db.us-west-2.internal:5432/mydb" \
--overwrite
# 5. Deploy/restart apps in DR region Replication Monitoring Checklist
□ Replication lag monitored and alerted (threshold = RPO target)
□ pg_stat_replication checked in weekly ops review
□ WAL sender and receiver counts alerted (should be > 0)
□ Standby disk space monitored (replica needs same space as primary)
□ pg_basebackup test: can you take a fresh backup from primary?
□ Promotion tested quarterly (see restore drills chapter)
□ DNS/load balancer failover tested (not just DB promotion)
□ Application reconnection after failover tested (connection pool behavior)