Idempotency: Replicated Geo Database Nodes

The following discussion from !8 (merged) should be addressed:

  • @twk3 started a discussion: (+4 comments)

    For the scope of this MR, are we expecting that we can call run-orchestration multiple times? Currently a second run (after geo is setup) I get the following:

    TASK [start replication] ***********************************************************************************************
    fatal: [dj-geo-database-1]: FAILED! => {"changed": true, "cmd": "gitlab-ctl replicate-geo-database --slot-name=geo --host=10.138.15.224 --no-wait", "delta": "0:00:00.849523", "end": "2020-06-10 18:04:37.315724", "msg": "non-zero return code", "rc": 1, "start": "2020-06-10 18:04:36.466201", "stdout": "\u001b[31mFound data inside the gitlabhq_production database! If you are sure you are in the secondary server, override with --force\u001b[0m", "stdout_lines": ["\u001b[31mFound data inside the gitlabhq_production database! If you are sure you are in the secondary server, override with --force\u001b[0m"]}

    Are we able to skip this if there is data in /var/opt/gitlab/postgresql/data?

There are a few approaches here:

  • check for the replica slot on the primary site's primary database
  • a local check if possible on the local database machine

Both of these are, unfortunately, calls to the command module which is not itself idempotent.

There better solution is likely to start work on a custom ansible module or filter which allows us to gate on something simple like when not geo_replication.enabled