Use primary DB for authenticating artifacts downloads

What does this MR do and why?

A CI job downloads job artifacts from the /api/v4/jobs/:id/artifacts endpoint. Previously the endpoint used any replica to authenticate the current user via a job token, but that token depends on the job record being in the database. However, there are no guarantees that the replica has an up-to-date record of that job. As a result, users could see intermittent 401 errors due to replication lag.

To avoid this, use the primary database when authenticating the build. This commit adds a ci_job_artifacts_use_primary_to_authenticate feature flag to roll this out.

Note that the runner API attempts to select an up-to-date replica for the job that produced the artifacts, but it has no good way of determining the job ID that originated the request for downloading artifacts. In addition, the user authentication happens before the replica selection happens.

Relates to #466138 (closed)

Changelog: fixed

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

  1. Create a PostgreSQL replica (https://medium.com/@umairhassan27/setting-up-postgresql-replication-on-slave-server-a-step-by-step-guide-1ff36bb9a47f). You can do this on your GDK or on an Omnibus instance.
  2. I took a shortcut and used some of features in GitLab Geo (https://docs.gitlab.com/ee/administration/geo/setup/database.html) for Ominbus GitLab. In my instance, I added:
postgresql['listen_address'] = '127.0.0.1'
postgresql['port'] = 5432
postgresql['sql_replication_password'] = '950233c0dfc2f39c64cf30457c3b7f1e'
postgresql['md5_auth_cidr_addresses'] = ['127.0.0.1/32', '192.168.2.1/32']

Run gitlab-ctl reconfigure. This creates a gitlab_replicator account with the password password.

  1. I created a pg_basebackup in my home dir under the dbreplica dir:
/opt/gitlab/embedded/bin/pg_basebackup -h localhost -D dbreplica -U gitlab_replicator -v -P --wal-method=stream
  1. Since DB load balancing requires hosts using the same port (5432), I created a dummy Ethernet device under IP 192.168.2.1:
sudo ip link add eth_dummy type dummy
sudo ip address add 192.168.2.1/24 dev eth_dummy
  1. Once that completed, I edited dbreplica/postgresql.conf and added a 30-second delay:
primary_conninfo = 'host=127.0.0.1 port=5432 user=gitlab_replicator password=password'
recovery_min_apply_delay = '30s'
listen_addresses = '192.168.2.1'
hot_standby = on
  1. Then I ran touch dbreplica/standby.signal.
  2. To start up the postgres -D dbreplica to start up the replica.
  3. With the replica up, I added this to /etc/gitlab/gitlab.rb:
gitlab_rails['db_load_balancing'] = { 'hosts' => ['192.168.2.1'] }
  1. Run gitlab-ctl reconfigure and gitlab-ctl restart puma.
  2. Confirm that the host is detected by GitLab Rails:
# grep "Host is online" /var/log/gitlab/gitlab-rails/database_load_balancing.log
{"severity":"INFO","time":"2024-06-09T06:37:06.379Z","correlation_id":"13aab9ee8aaba21a9e925fa693851355","event":"host_online","message":"Host is online after replica status check","db_host":"192.168.2.1","db_port":null}
  1. On the GitLab server, create a CI pipeline that has two jobs: one that creates artifacts, and another that downloads them:
image: ruby:latest

stages:
  - test
  - deploy

test:
  stage: test
  script:
    - echo "hello" > test.txt
  cache:
    paths:
      - test.txt
  artifacts:
    paths:
      - test.txt

deploy:
  stage: deploy
  script:
  - echo "Test deploy"
  1. There's a good chance there is a 401 Unauthorized will be hit by the deploy job, but if it doesn't fail retry again.
  2. Enable the feature flag in gitlab-rails console: Feature.enable(:ci_job_artifacts_use_primary_to_authenticate)
  3. Retry the deploy job several times and verify that the job passes.
Edited by Stan Hu

Merge request reports

Loading