Follow-up: Compare write-ahead replication log for `sidekiq server replica`
The following discussions from !55881 (merged) should be addressed:
-
@ayufan started a discussion: (+9 comments)
How, additionally we treat replicas? Do we consider all of them to have the same replication log?
I think we in any case want to ensure that we clearly indicate what was our source of information for data consistency.
In case that we didn't perform any writes before scheduling the job, is it possible to happen that the current replica
has a different write-ahead replication log location from the sidekiq server replica
?
If this is the case, we should then probably have something like this:
- Get the last write-ahead replication log on the client-side, when we schedule the job, if the write was not performed.
# ee/lib/gitlab/database/load_balancing/host.rb
def database_replica_location
row = query_and_release(<<-SQL.squish)
SELECT pg_last_wal_replay_lsn()::text AS location
SQL
row['location'] if row.any?
rescue *CONNECTION_ERRORS
nil
end
- How expensive is this operation?
- What if we schedule a lot of jobs? Should we somehow cache this and the primary write location?
- Is
Gitlab::Database::LoadBalancing::Session
a good candidate to store those locations?
- In
SidekiqClientMiddleware
, we should pass thedatabase_replica_location
to the job, the same way we do forprimary_write_location
:
# ee/lib/gitlab/database/load_balancing/sidekiq_client_middleware.rb
def mark_data_consistency_location(worker_class, job)
...
if Session.current.performed_write?
job['database_write_location'] = load_balancer.primary_write_location
else
job['database_replica_location'] = load_balancer.host.database_replica_location
end
end
- In
SidekiqServerMiddleware
we should check if the sidekiq current replica is up to date:
# ee/lib/gitlab/database/load_balancing/sidekiq_server_middleware.rb
def requires_primary?(worker_class, job)
...
location = job['database_replica_location'] || job['database_write_location']
if replica_caught_up?(location)
false
elsif worker_class.get_data_consistency == :delayed && job['retry_count'].to_i == 0
raise JobNotUpToDate
else
true
end
end