Skip to content

chore(datastore): improve database LB implementation and add CI tests

João Pereira requested to merge dlb-hosts-2 into master

What does this MR do?

Related to #1276 (closed).

  • Adds multi error support to datastore.NewDBLoadBalancer (followup).

  • Simplifies the datastore.DBLoadBalancer initialization by adding a functional option for using a fixed hosts list, instead of passing a list of DSNs for replicas. This functional option can later be used interchangeably with a new option for service discovery. Instead of creating a new set of functional options, I've generalized the existing OpenOption to Option so that it encapsulates functional options for both Open and NewDBLoadBalancer functions.

  • Adds support for enabling load balancing with a fixed hosts list. Right now this only means that connection pools are being initialized for replicas whenever the option is enabled. Only the primary pool is being used.

  • Adds a CI job to setup a postgres cluster using bitnami/postgresql and run all API integration tests with the registry configured with load balancing fixed hosts. I decided to enable all API related tags for this job instead of having separate jobs (one for conformance tests, one for GitLab API, etc). I think it's premature to spend too much time around this as significant changes will follow. Meanwhile the job runtime is still manageable (~17m, ~6m more than api:conformance, and the same as gcs, so the overall pipeline duration remains unchanged).

Local Testing

I've been testing this locally using GDK, which includes support for PostgreSQL replication and Consul (not needed for this MR).

  1. Setup the PostgreSQL cluster following the basic setup guide here (minimum required for this MR), or the advanced setup guide (with pgBouncer and Consul) here. Depending on the setup you'll end up with either 1 or 2 replicas;

  2. Run gdk psql to get into a psql console and then create the registry database as usual with CREATE DATABASE registry;

  3. Configure the registry:

    log:
      level: debug
      formatter: text
    database:
      enabled: true
      host: /<full path to gdk root>/postgresql/
      port: 5432
      dbname: registry
      sslmode: disable
      loadbalancing:
        enabled: true
        hosts:
          - /<full path to gdk root>/postgresql-replica/
          - /<full path to gdk root>/postgresql-replica-2/
  4. Tail PostgreSQL logs in a separate window:

    gdk tail postgresql*
  5. Run the registry. You should see something like this:

    INFO[0000] Connect                                       database=registry duration_ms=3 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84101 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=5 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql-replica/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84102 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=16 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql-replica-2/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84103 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=2 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84104 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Query                                         args="[]" commandTag="CREATE TABLE" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84104 sql="create table if not exists \"schema_migrations\" (\"id\" text not null primary key, \"applied_at\" timestamp with time zone) ;" version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=4 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84105 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Query                                         args="[map[16:1 17:1 20:1 21:1 23:1 26:1 28:1 29:1 700:1 701:1 1082:1 1114:1 1184:1]]" commandTag="SELECT 162" database=registry duration_ms=5 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84105 sql="SELECT * FROM \"schema_migrations\" ORDER BY \"id\" ASC" version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=1 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84106 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Query                                         args="[]" commandTag="CREATE TABLE" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84106 sql="create table if not exists \"schema_migrations\" (\"id\" text not null primary key, \"applied_at\" timestamp with time zone) ;" version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Connect                                       database=registry duration_ms=1 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84107 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m
    INFO[0000] Query                                         args="[map[16:1 17:1 20:1 21:1 23:1 26:1 28:1 29:1 700:1 701:1 1082:1 1114:1 1184:1]]" commandTag="SELECT 162" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84107 sql="SELECT * FROM \"schema_migrations\" ORDER BY \"id\" ASC" version=v4.5.0-gitlab-20-gbac9b1549.m

    Note that database log entries contain a host key/value pair that tells us the host that each operation is targeting. We can see that the registry connects to all 3 hosts (1 primary and 2 replicas in my case), but all queries from there onwards target the primary only.

  6. In the PostgreSQL logs you should see something like this:

    2024-06-27_13:56:57.69718 postgresql            : 2024-06-27 14:56:57.697 WEST [16895] LOG:  statement: -- ping
    2024-06-27_13:56:57.71803 postgresql-replica    : 2024-06-27 14:56:57.716 WEST [16896] LOG:  statement: -- ping
    2024-06-27_13:56:57.74822 postgresql-replica-2  : 2024-06-27 14:56:57.748 WEST [16901] LOG:  statement: -- ping
    2024-06-27_13:56:57.75166 postgresql            : 2024-06-27 14:56:57.751 WEST [16920] LOG:  statement: create table if not exists "schema_migrations" ("id" text not null primary key, "applied_at" timestamp with time zone) ;
    2024-06-27_13:56:57.75564 postgresql            : 2024-06-27 14:56:57.755 WEST [16922] LOG:  statement: SELECT * FROM "schema_migrations" ORDER BY "id" ASC
    2024-06-27_13:56:57.77475 postgresql            : 2024-06-27 14:56:57.773 WEST [16927] LOG:  statement: create table if not exists "schema_migrations" ("id" text not null primary key, "applied_at" timestamp with time zone) ;
    2024-06-27_13:56:57.78049 postgresql            : 2024-06-27 14:56:57.779 WEST [16931] LOG:  statement: SELECT * FROM "schema_migrations" ORDER BY "id" ASC

Author checklist

  • Feature flags
    • Added feature flag:
    • This feature does not require a feature flag
  • I added unit tests or they are not required
  • I added documentation (or it's not required)
  • I followed code review guidelines
  • I followed Go Style guidelines
  • For database changes including schema migrations:
    • Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here.
    • If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison.
      • I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
    • Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
  • Ensured this change is safe to deploy to individual stages in the same environment (cny -> prod). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.

Reviewer checklist

  • Ensure the commit and MR tittle are still accurate.
  • If the change contains a breaking change, apply the breaking change label.
  • If the change is considered high risk, apply the label high-risk-change
  • Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).

If the MR introduces database schema migrations:

  • Ensure the commit and MR tittle start with fix:, feat:, or perf: so that the change appears on the Changelog
If the changes cannot be rolled back follow these steps:
  • If not, apply the label cannot-rollback.
  • Add a section to the MR description that includes the following details:
    • The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure)
    • Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.)
    • Ensure this MR does not add code that depends on these changes that cannot be rolled back.
Edited by João Pereira

Merge request reports