2017/04/21 9.1 RC1 deployment problems
We ran into a number of problems with deploying today:
- Migrations on blessed worker failed due to a database statement timeout: https://gitlab.com/gitlab-org/gitlab-ce/snippets/1657595
- The migration,
20170124193205_add_two_factor_columns_to_users.rb
, was left in incomplete state, where the columns existed but the index was invalid. - Disabled the statement timeout altogether via
ALTER DATABASE gitlabhq_production SET statement_timeout = 0
- We deleted the invalid index, re-created it by hand, and inserted
20170124193205
intoschema_migrations
. - Running the deploy again did not re-run the migrations and instead deployed to the whole cluster without checking the migration status.
- Error 500s abounded since there were missing migrations.
- Re-ran the migrations by hand, but there were a number of long-running migrations that added default values to both
projects
andusers
table. - pgbouncer died in the middle of these migrations (reloaded config?) and caused more Error 500s.
- There was one post-migration to remove temporary files that took too long (https://gitlab.com/gitlab-org/gitlab-ce/issues/30866).
/cc: @felipe_artur, @yorickpeterse, @twk3, @rspeicher, @ayufan, @godfat