Skip to content

The `use_model_load_balancing` results in a wrong sticking context used

What does this MR do and why?

Fixes bug introduced by !73631 (merged) in use_model_load_balancing.

When doing use_model_load_balancing a wrong sticking context is used. This solves a subtle bug with sticking when use_model_load_balancing is used.

The problem was that for Web/Sidekiq:

  1. If prior request would have use_model_load_balancing=false it would see load balancers to be main/main. This would result in sticking context to include only main.

  2. If next request would evaluate use_model_load_balancing=true it would see load balancers for main/ci, but would only for wal locations see main=lag. As a result we would not check replication lag against ci.

  3. This result in a situation that if ci is way behind main, the 2. would not understand the state of ci. Thus would consider ci to be up-to date and stick to main.

This commit fixes that:

  1. If ci: is configured we always store the ci: in replication lag. This makes us the wal_locations (and RackMiddleware sticking context) to always include main+ci in all cases.

  2. This results would behave (as for number of requests): we are reading primary / replica location from all databases.

  3. Since we always have ci even if use_model_load_balancing=false we can properly evaluate state of replicas.

Related to:

Test

For a case when ci-replica has significantly more lag than main:

development:
  main:
    host: postgres
    load_balancing:
      hosts:
      - postgres
      - postgres
  ci:
    host: postgres
    load_balancing:
      hosts:
      - postgres-replica
      - postgres-replica

def current_wals
  ::Gitlab::Database::LoadBalancing.each_load_balancer.to_h do |lb|
    [lb.name, lb.primary_write_location]
  end
end

def use_replica?(wals)
  ::Gitlab::Database::LoadBalancing.each_load_balancer.all? do |lb|
    if location = wals[lb.name]
      lb.select_up_to_date_host(location)
    else
      true
    end
  end
end

def run_test
  Feature.disable(:use_model_load_balancing)
  Gitlab::SafeRequestStore.begin!
  Gitlab::SafeRequestStore.clear!
  wals = current_wals
  User.last.update(note: rand())

  Feature.enable(:use_model_load_balancing)
  Gitlab::SafeRequestStore.begin!
  Gitlab::SafeRequestStore.clear!
  pp(wals: wals, latest_wals: current_wals, use_replica: use_replica?(wals))
end
# Without this MR
[64] pry(main)> run_test
=> {:wals=>{:main=>"1/8700B170"}, :latest_wals=>{:main=>"1/8700B3C0", :ci=>"1/8700B3C0"}, :use_replica=>true}
# With this MR
[11] pry(main)> run_test
=> {:wals=>{:main=>"1/87008E78", :ci=>"1/87008E78"}, :latest_wals=>{:main=>"1/8700B0D8", :ci=>"1/8700B0D8"}, :use_replica=>false}

It is expected that see use_replica=false in all cases.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kamil Trzciński

Merge request reports