The progress update prior to this can be found in the previous post made here
My post mid-term tasks mainly involved researching on PostgreSQL upgrade strategies for the managed CloudSQL PostgreSQL instances used in Galoy's infrastructure. The further post outlines my findings for performing seamless upgrades
Google Cloud Platform (GCP) offers two primary methods for upgrading PostgreSQL instances on CloudSQL:
- Using the in-place
pg_upgrade
tool provided by GCP - Utilizing the Database Migration Service (DMS) to perform upgrades
Below is a small report on finding using both the tools:
The in-place upgrade using pg_upgrade
is a straightforward method provided by GCP for upgrading PostgreSQL instances. With the infrastructure employed by Galoy you only need to edit the terraform config and change the database_version
from POSTGRES_14
to POSTGRES_15
and doing a tofu apply
upgrades the instances in-place.
While this method is very seamless and carries all the configuration in the upgraded instance, the downtime recorded was sometimes over 15 minutes, which exceeded the criteria of minimal downtime of 10 minutes, so research on DMS was done.
Most of the post mid-term work focused on the Database Migration Service method, which offers more control over the upgrade process and keeps downtime under 10 minutes (including user migrations), but requires additional steps to migrate users and permissions as it doesn't automatically copy them during the upgrade. To address the user and permission migration issue with DMS, I created three scripts:
- terraform-db-swap.sh: Transfers users to the new instance
- terraform-state-rm.sh: Updates the Terraform state for the newer instance
- postgres-perms-update.sh: Resolves all permissions-related issues
Based on the research and testing, the Database Migration Service (DMS) method is recommended for Galoy's infrastructure due to its ability to meet the minimal downtime requirement of under 10 minutes, despite the need for additional steps to handle user and permission migration.
For a complete guide on the DMS upgrade process, including user migration and permission updates, please refer to the documentation here.
- fix: Use netcat-traditional instead of netcat
- feat: Add daily backup configuration
- chore: Postgres replica depend on master
- chore: Bump cyrilgdn/postgresql to 1.22.0
- draft: Modules and strategies for postgres upgrade
- feat: Conditionally enable dms specific flags
- test: Source db setup for migration
- fix: Use netcat-bsd instead of traditional
- feat: Use migration user instead of database admin for migration
- docs: postgres update doc
- feat: Grant datamigration perms to inception ci
- Grant get permission for migration profile and disable replication for CI
- feat: Grant datamigration.operations.get to CI
- fix: destroy step with targetting the admin user
- chore: Remove duplicate code
- fix: Ignore the first tf destroy error code
- chore: typo
- fix: -auto-approve flag and extension
- fix: Resolve concurrent table update issue
- test: Add tests for different postgresql scenarios
- feat: use default admin for destination instance
- chore: typo in the source flag
- feat: Optimise postgres update strategy
- fix: Ci typo error
- feat: Make the db migration process more streamlined
- feat: Update the pg migration guide and postgres permissions script
- fix: postgres perms script
- chore: fix scripts and make the guide better
This report concludes my internship with Galoy through the Summer of Bitcoin. I am grateful for this opportunity and managed to learn a lot about DevOps tools and infrastructure in general. Kartik was helpful throughout the program, and I had a lot of fun being his mentee. Overall, the whole program was awesome, and I thoroughly enjoyed it!