chore(datastore): improve database LB implementation and add CI tests
What does this MR do?
Related to #1276 (closed).
-
Adds multi error support to
datastore.NewDBLoadBalancer
(followup). -
Simplifies the
datastore.DBLoadBalancer
initialization by adding a functional option for using a fixedhosts
list, instead of passing a list of DSNs for replicas. This functional option can later be used interchangeably with a new option for service discovery. Instead of creating a new set of functional options, I've generalized the existingOpenOption
toOption
so that it encapsulates functional options for bothOpen
andNewDBLoadBalancer
functions. -
Adds support for enabling load balancing with a fixed
hosts
list. Right now this only means that connection pools are being initialized for replicas whenever the option is enabled. Only the primary pool is being used. -
Adds a CI job to setup a postgres cluster using
bitnami/postgresql
and run all API integration tests with the registry configured with load balancing fixedhosts
. I decided to enable all API related tags for this job instead of having separate jobs (one for conformance tests, one for GitLab API, etc). I think it's premature to spend too much time around this as significant changes will follow. Meanwhile the job runtime is still manageable (~17m, ~6m more thanapi:conformance
, and the same asgcs
, so the overall pipeline duration remains unchanged).
Local Testing
I've been testing this locally using GDK, which includes support for PostgreSQL replication and Consul (not needed for this MR).
-
Setup the PostgreSQL cluster following the basic setup guide here (minimum required for this MR), or the advanced setup guide (with pgBouncer and Consul) here. Depending on the setup you'll end up with either 1 or 2 replicas;
-
Run
gdk psql
to get into apsql
console and then create the registry database as usual withCREATE DATABASE registry;
-
Configure the registry:
log: level: debug formatter: text database: enabled: true host: /<full path to gdk root>/postgresql/ port: 5432 dbname: registry sslmode: disable loadbalancing: enabled: true hosts: - /<full path to gdk root>/postgresql-replica/ - /<full path to gdk root>/postgresql-replica-2/
-
Tail PostgreSQL logs in a separate window:
gdk tail postgresql*
-
Run the registry. You should see something like this:
INFO[0000] Connect database=registry duration_ms=3 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84101 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=5 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql-replica/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84102 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=16 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql-replica-2/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84103 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=2 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84104 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Query args="[]" commandTag="CREATE TABLE" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84104 sql="create table if not exists \"schema_migrations\" (\"id\" text not null primary key, \"applied_at\" timestamp with time zone) ;" version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=4 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84105 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Query args="[map[16:1 17:1 20:1 21:1 23:1 26:1 28:1 29:1 700:1 701:1 1082:1 1114:1 1184:1]]" commandTag="SELECT 162" database=registry duration_ms=5 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84105 sql="SELECT * FROM \"schema_migrations\" ORDER BY \"id\" ASC" version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=1 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84106 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Query args="[]" commandTag="CREATE TABLE" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84106 sql="create table if not exists \"schema_migrations\" (\"id\" text not null primary key, \"applied_at\" timestamp with time zone) ;" version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Connect database=registry duration_ms=1 go_version=go1.21.5 host=/Users/jpereira/Developer/gitlab.com/gitlab-org/gdk/postgresql/ instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84107 port=5432 version=v4.5.0-gitlab-20-gbac9b1549.m INFO[0000] Query args="[map[16:1 17:1 20:1 21:1 23:1 26:1 28:1 29:1 700:1 701:1 1082:1 1114:1 1184:1]]" commandTag="SELECT 162" database=registry duration_ms=0 go_version=go1.21.5 instance_id=0d128114-c0ed-44be-8bec-e3a40f2b1fb1 pid=84107 sql="SELECT * FROM \"schema_migrations\" ORDER BY \"id\" ASC" version=v4.5.0-gitlab-20-gbac9b1549.m
Note that database log entries contain a
host
key/value pair that tells us the host that each operation is targeting. We can see that the registry connects to all 3 hosts (1 primary and 2 replicas in my case), but all queries from there onwards target the primary only. -
In the PostgreSQL logs you should see something like this:
2024-06-27_13:56:57.69718 postgresql : 2024-06-27 14:56:57.697 WEST [16895] LOG: statement: -- ping 2024-06-27_13:56:57.71803 postgresql-replica : 2024-06-27 14:56:57.716 WEST [16896] LOG: statement: -- ping 2024-06-27_13:56:57.74822 postgresql-replica-2 : 2024-06-27 14:56:57.748 WEST [16901] LOG: statement: -- ping 2024-06-27_13:56:57.75166 postgresql : 2024-06-27 14:56:57.751 WEST [16920] LOG: statement: create table if not exists "schema_migrations" ("id" text not null primary key, "applied_at" timestamp with time zone) ; 2024-06-27_13:56:57.75564 postgresql : 2024-06-27 14:56:57.755 WEST [16922] LOG: statement: SELECT * FROM "schema_migrations" ORDER BY "id" ASC 2024-06-27_13:56:57.77475 postgresql : 2024-06-27 14:56:57.773 WEST [16927] LOG: statement: create table if not exists "schema_migrations" ("id" text not null primary key, "applied_at" timestamp with time zone) ; 2024-06-27_13:56:57.78049 postgresql : 2024-06-27 14:56:57.779 WEST [16931] LOG: statement: SELECT * FROM "schema_migrations" ORDER BY "id" ASC
Author checklist
-
Feature flags
-
Added feature flag: -
This feature does not require a feature flag
-
-
I added unit tests or they are not required -
I added documentation (or it's not required) -
I followed code review guidelines -
I followed Go Style guidelines -
For database changes including schema migrations: -
Manually run up and down migrations in a postgres.ai production database clone and post a screenshot of the result here. -
If adding new queries, extract a query plan from postgres.ai and post the link here. If changing existing queries, also extract a query plan for the current version for comparison. -
I do not have access to postgres.ai and have made a comment on this MR asking for these to be run on my behalf.
-
-
Do not include code that depends on the schema migrations in the same commit. Split the MR into two or more.
-
-
Ensured this change is safe to deploy to individual stages in the same environment ( cny
->prod
). State-related changes can be troublesome due to having parts of the fleet processing (possibly related) requests in different ways.
Reviewer checklist
-
Ensure the commit and MR tittle are still accurate. -
If the change contains a breaking change, apply the breaking change label. -
If the change is considered high risk, apply the label high-risk-change -
Identify if the change can be rolled back safely. (note: all other reasons for not being able to rollback will be sufficiently captured by major version changes).
If the MR introduces database schema migrations:
-
Ensure the commit and MR tittle start with fix:
,feat:
, orperf:
so that the change appears on the Changelog
If the changes cannot be rolled back follow these steps:
-
If not, apply the label cannot-rollback. -
Add a section to the MR description that includes the following details: -
The reasoning behind why a release containing the presented MR can not be rolled back (e.g. schema migrations or changes to the FS structure) -
Detailed steps to revert/disable a feature introduced by the same change where a migration cannot be rolled back. (note: ideally MRs containing schema migrations should not contain feature changes.) -
Ensure this MR does not add code that depends on these changes that cannot be rolled back.
-