Skip to content

Run serivce discovery on load balancing configuration

Simon Tomlinson requested to merge stomlinson/service-discovery-prefork into master

What does this MR do?

This MR forces an initial run of postgres service discovery as soon as load balancing is configured. This solves the problem in #323726 (closed).

Before this MR, puma would preload the application and set up load balancing with an empty host list before forking. Any database access either pre-fork or post-fork before the associated on_worker_start callback ran service discovery would then query the database primary.

https://gitlab.com/gitlab-org/gitlab/blob/master/config/initializers/load_balancing.rb#L17-27 shows this initialization process.

This MR simply runs service discovery once when load balancing is configured so that we retrieve an initial list of hosts immediately.

Screenshots or Screencasts (strongly suggested)

How to setup and validate locally (strongly suggested)

There are a few steps required to run service discovery locally. This is the easiest way I've found.

  1. Install dnsmasq to run a local dns nameserver.
    • brew install dnsmasq (or your package manager of choice)
  2. Start dnsmasq on port 53 (the default)
    • sudo brew services start dnsmasq (needs sudo because of the privileged port)
  3. Verify that dnsmasq is working
    • dig @localhost localhost +short should return 127.0.0.1
  4. Stop your local gdk postgresql
    • gdk stop postgresql
  5. Run postgresql directly so that it opens a tcp port.
    • cd your-gdk-dir/postgresql/data && pg_ctl start -D .
  6. Add the following production entry to your database.yml (this problem is only reproducible with RAILS_ENV=production)
production:
  main:
    adapter: postgresql
    encoding: unicode
    database: gitlabhq_development
    host: localhost
    port: 5432
    pool: 10
    prepared_statements: false
    variables:
      statement_timeout: 120s
    load_balancing:
      discover:
        nameserver: localhost
        port: 53
        record: localhost
        record_type: A
        interval: 60
        disconnect_timeout: 120

To reproduce the problem, without this branch checked out

  1. rm log/database_load_balancing.log so it is clear for the problem.
  2. Stop the running rails server gdk stop rails-web
  3. Start puma directly env RAILS_ENV=production bundle exec puma
  4. Wait until you see the messages that workers have booted, then shut down the server with ^C
  5. cat log/database_load_balancing.log and see the messages with event: "no_secondaries_available"

To verify that the problem was fixed, check out this branch, and repeat these steps. The log/database_load_balancing.log file will not be recreated because no messages will be written to it.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Simon Tomlinson

Merge request reports