Skip to content

2017/04/21 9.1 RC1 deployment problems

We ran into a number of problems with deploying today:

  1. Migrations on blessed worker failed due to a database statement timeout: https://gitlab.com/gitlab-org/gitlab-ce/snippets/1657595
  2. The migration, 20170124193205_add_two_factor_columns_to_users.rb, was left in incomplete state, where the columns existed but the index was invalid.
  3. Disabled the statement timeout altogether via ALTER DATABASE gitlabhq_production SET statement_timeout = 0
  4. We deleted the invalid index, re-created it by hand, and inserted 20170124193205 into schema_migrations.
  5. Running the deploy again did not re-run the migrations and instead deployed to the whole cluster without checking the migration status.
  6. Error 500s abounded since there were missing migrations.
  7. Re-ran the migrations by hand, but there were a number of long-running migrations that added default values to both projects and users table.
  8. pgbouncer died in the middle of these migrations (reloaded config?) and caused more Error 500s.
  9. There was one post-migration to remove temporary files that took too long (https://gitlab.com/gitlab-org/gitlab-ce/issues/30866).

/cc: @felipe_artur, @yorickpeterse, @twk3, @rspeicher, @ayufan, @godfat