Verify that the Geo cluster is healthy pre-upgrade. For a no-downtime upgrade, remove deploy nodes from load balancers/stop sidekiq and run looping-pipeline to confirm test pass.
Open the HAProxy stats dashboard for each site, to monitor health checks
Upgrade
Retrieve a beverage of choice within a drinkable temperature range
Join Zoom meeting and wait for arrival. Hit the record button
Manually trigger the looping test pipeline to start running smoke tests before upgrading the primary site and before upgrading the secondary site (if failures happen during the primary site update)
Jennifer Louiemarked the checklist item Create a zoom meeting and schedule a recorded meeting as incomplete
marked the checklist item Create a zoom meeting and schedule a recorded meeting as incomplete
Jennifer Louiemarked the checklist item Check if PostgreSQL is already the latest shipped version. If not, ensure PostgreSQL upgrade instructions are followed. as incomplete
marked the checklist item Check if PostgreSQL is already the latest shipped version. If not, ensure PostgreSQL upgrade instructions are followed. as incomplete
Jennifer Louiemarked the checklist item Verify that the Geo cluster is healthy pre-upgrade. For a no-downtime upgrade, remove deploy nodes from load balancers/stop sidekiq and run looping-pipeline to confirm test pass. as incomplete
marked the checklist item Verify that the Geo cluster is healthy pre-upgrade. For a no-downtime upgrade, remove deploy nodes from load balancers/stop sidekiq and run looping-pipeline to confirm test pass. as incomplete
marked the checklist item Perform upgrade steps described in latest documentation. as incomplete
Jennifer Louiemarked the checklist item Manually trigger the looping test pipeline to start running smoke tests before upgrading the primary site and before upgrading the secondary site (if failures happen during the primary site update) as incomplete
marked the checklist item Manually trigger the looping test pipeline to start running smoke tests before upgrading the primary site and before upgrading the secondary site (if failures happen during the primary site update) as incomplete
Jennifer Louiemarked the checklist item Check if PostgreSQL is already the latest shipped version. If not, ensure PostgreSQL upgrade instructions are followed. as completed
marked the checklist item Check if PostgreSQL is already the latest shipped version. If not, ensure PostgreSQL upgrade instructions are followed. as completed
Jennifer Louiemarked the checklist item Verify that the Geo cluster is healthy pre-upgrade. For a no-downtime upgrade, remove deploy nodes from load balancers/stop sidekiq and run looping-pipeline to confirm test pass. as completed
marked the checklist item Verify that the Geo cluster is healthy pre-upgrade. For a no-downtime upgrade, remove deploy nodes from load balancers/stop sidekiq and run looping-pipeline to confirm test pass. as completed
Jennifer Louiemarked the checklist item Retrieve a beverage of choice within a drinkable temperature range as completed
marked the checklist item Retrieve a beverage of choice within a drinkable temperature range as completed
Jennifer Louiemarked the checklist item Join Zoom meeting and wait for arrival. Hit the record button as completed
marked the checklist item Join Zoom meeting and wait for arrival. Hit the record button as completed
Jennifer Louiemarked the checklist item Manually trigger the looping test pipeline to start running smoke tests before upgrading the primary site and before upgrading the secondary site (if failures happen during the primary site update) as completed
marked the checklist item Manually trigger the looping test pipeline to start running smoke tests before upgrading the primary site and before upgrading the secondary site (if failures happen during the primary site update) as completed
marked the checklist item Perform upgrade steps described in latest documentation. as completed
Jennifer Louiemarked the checklist item During the upgrade process, monitor HAProxy stats dashboard and the looping test pipeline for any failures as completed
marked the checklist item During the upgrade process, monitor HAProxy stats dashboard and the looping test pipeline for any failures as completed
Jennifer Louiemarked the checklist item Record any issues encountered during the upgrade as completed
marked the checklist item Record any issues encountered during the upgrade as completed
Jennifer Louiemarked the checklist item Verify cluster health post upgrade as completed
marked the checklist item Verify cluster health post upgrade as completed
Jennifer Louiemarked the checklist item Verify PostgreSQL version is correct as completed
marked the checklist item Verify PostgreSQL version is correct as completed
Jennifer Louiechanged title from Upgrade Geo multi-server installation from 12.10.12 to latest 13.0 version to PARTIAL SUCCESS: Upgrade Geo multi-server installation from 12.10.12 to latest 13.0 version
changed title from Upgrade Geo multi-server installation from 12.10.12 to latest 13.0 version to PARTIAL SUCCESS: Upgrade Geo multi-server installation from 12.10.12 to latest 13.0 version
Jennifer Louiemarked the checklist item Record the upgrade outcome as SUCCESS (upgrade with zero downtime), FAILED, PARTIAL SUCCESS (upgrade but with downtime or unconfirmed downtime) as completed
marked the checklist item Record the upgrade outcome as SUCCESS (upgrade with zero downtime), FAILED, PARTIAL SUCCESS (upgrade but with downtime or unconfirmed downtime) as completed
This upgrade was done as part of an investigation into downtime during upgrades when following zero-downtime instructions: #225684 (closed).
It is labeled "PARTIAL SUCCESS" because we did observe downtime. However, the monitoring we did during the upgrade will help us address the downtime issues.
Also note: the full looping-pipeline was not run during the upgrade, but was run before and after the upgrade. To make monitoring the production logs easier we limited the tests to: geo nodes API, create merge request, geo http push, push via http, geo database delete replication.
Jennifer Louiemarked the checklist item Create new issue for the next upgrade demo (the next versions) and assign to @nhxnguyen and @fzimmer as completed
marked the checklist item Create new issue for the next upgrade demo (the next versions) and assign to @nhxnguyen and @fzimmer as completed
I noticed I did not reconfigure after checking for gitlab_rails['auto_migrate'] = false on the primary non-deploy nodes in Step 2 . But I hadn't changed any config settings since they had been set during the previous upgrade, so I think it's ok
Jennifer Louiemarked the checklist item Update Geo validation tests docs page (doc/administration/geo/replication/geo_validation_tests.md) as completed
marked the checklist item Update Geo validation tests docs page (doc/administration/geo/replication/geo_validation_tests.md) as completed