CloudTadaInsights

Lesson 24: Upgrade Strategies

Upgrade Strategies

After this lesson, you will be able to:

  • Plan and execute PostgreSQL major version upgrades.
  • Upgrade Patroni with zero downtime.
  • Use pg_upgrade for in-place upgrades.
  • Implement logical replication for upgrades.
  • Rollback failed upgrades safely.

1. Upgrade Planning

1.1. Pre-upgrade checklist

TEXT
☐ Review PostgreSQL release notes
☐ Check extension compatibility
☐ Test upgrade in staging environment
☐ Backup all data (full + WAL archive)
☐ Document current versions
☐ Schedule maintenance window
☐ Notify stakeholders
☐ Prepare rollback plan
☐ Verify disk space (need 2x current data size)
☐ Check for deprecated features in new version
☐ Update monitoring/alerting
☐ Prepare downtime communication

1.2. Version compatibility matrix

From → ToMethodDowntimeRisk
17 → 18pg_upgradeMinutesLow
15 → 18pg_upgradeMinutesMedium
12 → 18Logical replicationNoneMedium
9.6 → 18Dump/restoreHoursHigh

1.3. Document current state

BASH
# PostgreSQL version
psql -c "SELECT version();"

# Installed extensions
psql -c "\dx"

# Database sizes
psql -c "SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database;"

# Patroni version
patronictl version

# etcd version
etcdctl version

2. PostgreSQL Minor Version Upgrade

2.1. Minor upgrade process (e.g., 18.0 → 18.1)

BASH
# Minor upgrades are easy - just update packages

# On each node (one at a time):

# 1. Update packages
sudo apt-get update
sudo apt-get install --only-upgrade postgresql-18

# 2. Restart Patroni (will restart PostgreSQL)
sudo systemctl restart patroni

# 3. Verify new version
psql -c "SELECT version();"

# Patroni handles failover automatically during restart

2.2. Rolling minor upgrade

BASH
# Upgrade replicas first
for node in node2 node3; do
  echo "Upgrading $node..."
  ssh $node "sudo apt-get update && sudo apt-get install -y --only-upgrade postgresql-18"
  ssh $node "sudo systemctl restart patroni"
  sleep 30  # Wait for replica to catch up
done

# Switchover to upgraded replica
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Upgrade old leader (now replica)
ssh node1 "sudo apt-get update && sudo apt-get install -y --only-upgrade postgresql-18"
ssh node1 "sudo systemctl restart patroni"

3. PostgreSQL Major Version Upgrade with pg_upgrade

3.1. Architecture

TEXT
Before (PostgreSQL 17):
  node1 (17, Leader)
  node2 (17, Replica)
  node3 (17, Replica)

During upgrade:
  node1 (17, Leader) ← Still serving traffic
  node2 (18, NEW) ← Upgrading
  node3 (17, Replica)

After upgrade:
  node1 (18, Leader) ← Upgraded
  node2 (18, Replica)
  node3 (18, Replica)

3.2. Install new PostgreSQL version

BASH
# Install PostgreSQL 18 alongside 17
sudo apt-get install -y postgresql-18 postgresql-18-contrib

# Both versions now installed:
# /usr/lib/postgresql/17/
# /usr/lib/postgresql/18/

3.3. Prepare for upgrade (node2 - first replica)

BASH
# 1. Stop Patroni on node2
sudo systemctl stop patroni

# 2. Create new data directory for v18
sudo mkdir -p /var/lib/postgresql/18/data
sudo chown postgres:postgres /var/lib/postgresql/18/data

# 3. Initialize new cluster
sudo -u postgres /usr/lib/postgresql/18/bin/initdb \
  -D /var/lib/postgresql/18/data \
  --encoding=UTF8 \
  --data-checksums

# 4. Run pg_upgrade
sudo -u postgres /usr/lib/postgresql/18/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/17/data \
  --new-datadir=/var/lib/postgresql/18/data \
  --old-bindir=/usr/lib/postgresql/17/bin \
  --new-bindir=/usr/lib/postgresql/18/bin \
  --check  # Dry run first!

# If check passes, run actual upgrade:
sudo -u postgres /usr/lib/postgresql/18/bin/pg_upgrade \
  --old-datadir=/var/lib/postgresql/17/data \
  --new-datadir=/var/lib/postgresql/18/data \
  --old-bindir=/usr/lib/postgresql/17/bin \
  --new-bindir=/usr/lib/postgresql/18/bin \
  --link  # Use hard links (faster)

# Expected output:
# Performing Consistency Checks
# -----------------------------
# ...
# Upgrade Complete
# ----------------

3.4. Update Patroni configuration for v18

YAML
# /etc/patroni/patroni.yml
postgresql:
  bin_dir: /usr/lib/postgresql/18/bin  # Changed from 17
  data_dir: /var/lib/postgresql/18/data  # Changed from 17
  # ... rest of config
BASH
# Start Patroni with new version
sudo systemctl start patroni

# Verify node2 is now running v18
psql -h node2 -U postgres -c "SELECT version();"

3.5. Upgrade remaining nodes

BASH
# Repeat process for node3
ssh node3 "sudo systemctl stop patroni"
ssh node3 "# ... same pg_upgrade steps ..."
ssh node3 "sudo systemctl start patroni"

# Finally, upgrade node1 (current leader)
# Switchover to node2 first
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Now upgrade node1
ssh node1 "sudo systemctl stop patroni"
ssh node1 "# ... same pg_upgrade steps ..."
ssh node1 "sudo systemctl start patroni"

# All nodes now on v18!
patronictl list

3.6. Post-upgrade tasks

BASH
# Run generated optimize scripts
sudo -u postgres ./analyze_new_cluster.sh
sudo -u postgres ./reindex_hash.sh  # If upgrading from < 10

# Update extensions
psql -c "ALTER EXTENSION pg_stat_statements UPDATE;"

# Vacuum analyze all databases
vacuumdb --all --analyze-in-stages

# Remove old cluster (after verifying everything works!)
# sudo -u postgres ./delete_old_cluster.sh

4. Zero-Downtime Upgrade with Logical Replication

4.1. Architecture

TEXT
Production (v17):
  node1 (v17, Leader) ← Serving traffic
    ↓ Logical replication
New cluster (v18):
  node4 (v18, Leader) ← Receiving changes
  node5 (v18, Replica)
  
After cutover:
  Application → node4 (v18) ← New primary

4.2. Setup new v18 cluster

BASH
# Install PostgreSQL 18 on new servers
# Setup Patroni cluster (node4, node5, node6)
# See previous lessons for installation

# Verify new cluster
patronictl -c /etc/patroni/patroni-v18.yml list

4.3. Create publication on v17 (source)

SQL
-- On node1 (v17 leader)
CREATE PUBLICATION pg17_to_pg18 FOR ALL TABLES;

-- Or specific tables:
-- CREATE PUBLICATION pg17_to_pg18 FOR TABLE users, orders, products;

-- Verify
SELECT * FROM pg_publication;

4.4. Create subscription on v18 (target)

SQL
-- On node4 (v18 leader)
CREATE SUBSCRIPTION pg18_from_pg17
CONNECTION 'host=node1 port=5432 dbname=myapp user=replicator password=rep_pass'
PUBLICATION pg17_to_pg18
WITH (copy_data = true, create_slot = true);

-- Monitor initial sync
SELECT * FROM pg_stat_subscription;

-- Wait for initial data copy to complete
-- subname     | pg18_from_pg17
-- received_lsn | 0/3000000
-- ...

4.5. Monitor replication lag

SQL
-- On v17 (source)
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn
FROM pg_replication_slots
WHERE slot_name LIKE '%pg18%';

-- On v18 (target)
SELECT subname, 
       received_lsn, 
       latest_end_lsn,
       pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) AS lag
FROM pg_stat_subscription;

4.6. Cutover procedure

BASH
# 1. Stop writes to v17 (put app in maintenance mode)
# Or set database to read-only:
psql -h node1 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = on;"
psql -h node1 -U postgres -c "SELECT pg_reload_conf();"

# 2. Wait for replication to catch up
psql -h node4 -U postgres -c "SELECT pg_size_pretty(pg_wal_lsn_diff(latest_end_lsn, received_lsn)) FROM pg_stat_subscription;"
# Should be 0 bytes

# 3. Disable subscription on v18
psql -h node4 -U postgres -c "ALTER SUBSCRIPTION pg18_from_pg17 DISABLE;"

# 4. Drop subscription (optional, after confirming everything works)
# psql -h node4 -U postgres -c "DROP SUBSCRIPTION pg18_from_pg17;"

# 5. Update application connection strings to point to node4

# 6. Enable writes on v18
psql -h node4 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = off;"
psql -h node4 -U postgres -c "SELECT pg_reload_conf();"

# 7. Verify application works on v18

# 8. Keep v17 cluster running for rollback (1-2 weeks)

5. Patroni Version Upgrade

5.1. Check compatibility

BASH
# Check current Patroni version
patronictl version

# Check PostgreSQL compatibility
# Patroni 3.2.0+ supports PostgreSQL 18
# See: https://github.com/zalando/patroni/releases

5.2. Upgrade Patroni (Python package)

BASH
# On each node:

# 1. Upgrade via pip
sudo pip3 install --upgrade patroni[etcd]

# Or specific version:
# sudo pip3 install patroni[etcd]==3.2.2

# 2. Verify new version
patronictl version

# 3. Restart Patroni service
sudo systemctl restart patroni

# No downtime - Patroni handles failover automatically

5.3. Rolling Patroni upgrade

BASH
# Upgrade replicas first
for node in node2 node3; do
  echo "Upgrading Patroni on $node..."
  ssh $node "sudo pip3 install --upgrade patroni[etcd]"
  ssh $node "sudo systemctl restart patroni"
  sleep 10
done

# Switchover to upgraded replica
patronictl switchover postgres-cluster --leader node1 --candidate node2

# Upgrade old leader
ssh node1 "sudo pip3 install --upgrade patroni[etcd]"
ssh node1 "sudo systemctl restart patroni"

6. etcd Upgrade

6.1. etcd minor upgrade

BASH
# On each etcd node:
sudo systemctl stop etcd
sudo apt-get update
sudo apt-get install --only-upgrade etcd
sudo systemctl start etcd

# Verify cluster health
etcdctl endpoint health

6.2. etcd major upgrade (e.g., 3.4 → 3.5)

BASH
# Follow official etcd upgrade guide
# https://etcd.io/docs/latest/upgrades/

# Key steps:
# 1. Backup etcd data
etcdctl snapshot save backup.db

# 2. Upgrade one member at a time
# 3. Verify cluster health after each upgrade
# 4. Update Patroni to use new etcd API version

7. Rollback Strategies

7.1. Rollback pg_upgrade

BASH
# Before deleting old cluster, you can rollback

# 1. Stop Patroni
sudo systemctl stop patroni

# 2. Restore old configuration
# /etc/patroni/patroni.yml
postgresql:
  bin_dir: /usr/lib/postgresql/17/bin  # Back to 17
  data_dir: /var/lib/postgresql/17/data

# 3. Start Patroni
sudo systemctl start patroni

# Old cluster resumes operation

7.2. Rollback logical replication

BASH
# If cutover to v18 fails, rollback to v17

# 1. Stop application

# 2. Set v17 to read-write
psql -h node1 -U postgres -c "ALTER SYSTEM SET default_transaction_read_only = off;"
psql -h node1 -U postgres -c "SELECT pg_reload_conf();"

# 3. Update application connection strings to v17

# 4. Resume normal operations

# Note: Any writes to v18 during cutover will be LOST!
# Consider setting up reverse replication v18 → v17 if needed

7.3. Rollback Patroni upgrade

BASH
# Downgrade Patroni if upgrade causes issues

sudo pip3 install patroni[etcd]==3.1.2  # Previous version
sudo systemctl restart patroni

8. Testing Upgrades

8.1. Staging environment test

BASH
# 1. Clone production to staging
pg_basebackup -h prod-leader -D /var/lib/postgresql/staging -X stream

# 2. Perform upgrade in staging
# ... follow upgrade procedures ...

# 3. Run application tests
# ... smoke tests, integration tests ...

# 4. Benchmark performance
pgbench -i -s 100 myapp
pgbench -c 10 -j 2 -t 1000 myapp

# 5. Document issues and timings

8.2. Upgrade rehearsal

BASH
# Practice upgrade multiple times
# Time each step
# Identify bottlenecks
# Refine procedures

# Example timing log:
# Step 1: Stop Patroni - 5 seconds
# Step 2: pg_upgrade --check - 30 seconds
# Step 3: pg_upgrade - 10 minutes
# Step 4: Start Patroni - 15 seconds
# Step 5: Replication catchup - 2 minutes
# Total: ~13 minutes

9. Best Practices

✅ DO

  1. Test in staging first: Multiple times.
  2. Backup everything: Full backup + WAL archive.
  3. Use pg_upgrade --check: Catch issues early.
  4. Document procedures: Step-by-step runbook.
  5. Schedule maintenance window: Off-peak hours.
  6. Monitor closely: During and after upgrade.
  7. Keep old version: Don't delete for 1-2 weeks.
  8. Use logical replication: For zero-downtime.
  9. Upgrade extensions: After PostgreSQL upgrade.
  10. Vacuum analyze: After major upgrade.

❌ DON'T

  1. Don't skip backups: Critical safety net.
  2. Don't upgrade all at once: Rolling upgrades.
  3. Don't delete old cluster immediately: Keep for rollback.
  4. Don't ignore release notes: Breaking changes.
  5. Don't skip testing: Staging is essential.
  6. Don't upgrade during peak hours: Plan maintenance window.
  7. Don't forget about extensions: May need updates.

10. Lab Exercises

Lab 1: Minor version upgrade

Tasks:

  1. Check current PostgreSQL version.
  2. Update packages on replica.
  3. Restart Patroni on replica.
  4. Switchover to upgraded replica.
  5. Upgrade old leader.

Lab 2: Major version upgrade with pg_upgrade

Tasks:

  1. Install PostgreSQL 18 on all nodes.
  2. Run pg_upgrade --check on replica.
  3. Perform pg_upgrade on replica.
  4. Update Patroni configuration.
  5. Complete rolling upgrade.

Lab 3: Zero-downtime upgrade with logical replication

Tasks:

  1. Setup new v18 cluster.
  2. Create publication on v17.
  3. Create subscription on v18.
  4. Monitor replication lag.
  5. Perform cutover.
  6. Verify application functionality.

Lab 4: Rollback procedure

Tasks:

  1. Simulate failed upgrade.
  2. Stop Patroni on all nodes.
  3. Restore old configuration.
  4. Restart old cluster.
  5. Verify rollback successful.

11. Summary

Upgrade Methods Comparison

MethodDowntimeComplexityRiskUse Case
pg_upgradeMinutesLowLowMinor version jumps
Logical replicationNoneHighMediumZero-downtime required
Dump/restoreHoursLowLowAncient versions
pg_upgrade --linkSecondsMediumMediumSame server upgrade

Typical Timeline

TEXT
Week 1-2: Planning and preparation
  - Review release notes
  - Test in staging
  - Document procedures

Week 3: Dress rehearsal
  - Full upgrade test in staging
  - Time each step
  - Identify issues

Week 4: Production upgrade
  - Execute during maintenance window
  - Monitor closely
  - Be ready to rollback

Week 5-6: Stabilization
  - Monitor for issues
  - Performance tuning
  - Keep old cluster for safety

Week 7+: Cleanup
  - Delete old cluster
  - Update documentation
  - Post-mortem review

Upgrade Checklist

TEXT
Pre-upgrade:
☐ Full backup completed
☐ Staging test successful
☐ Release notes reviewed
☐ Maintenance window scheduled
☐ Rollback plan documented
☐ Stakeholders notified

During upgrade:
☐ Backups verified
☐ pg_upgrade --check passed
☐ Upgrade completed
☐ Replication working
☐ Application connectivity verified

Post-upgrade:
☐ Extensions updated
☐ Vacuum analyze completed
☐ Performance validated
☐ Monitoring updated
☐ Documentation updated

Next Steps

Lesson 25 will cover Real-world Case Studies:

  • Production architecture examples
  • Scaling to 1000+ queries/second
  • Cost optimization techniques
  • Lessons learned from failures
  • Industry-specific implementations

Share this article

You might also like

Browse all articles
Series

PostgreSQL Security Hardening Guide

Essential security features and hardening measures for PostgreSQL HA cluster deployment with Patroni, etcd, and PgBouncer. Follow this guide for production security best practices.

#Database#PostgreSQL#Security
Series

PostgreSQL HA Cluster - Monitoring Stack

Full monitoring stack with Prometheus and Grafana for PostgreSQL HA cluster. Includes pre-configured dashboards, alerting rules, and exporter configurations.

#Database#PostgreSQL#Monitoring

Lesson 9: Bootstrap PostgreSQL Cluster

Learn how to bootstrap a Patroni cluster including starting Patroni for the first time on 3 nodes, verifying cluster status with patronictl, checking replication, troubleshooting common issues, and testing basic failover.

#Patroni#bootstrap#cluster