Follow-up from "Fix NPE when scheduling Sidekiq job and no DB replicas available"
The following discussion from !106184 (merged) should be addressed:
-
@nmilojevic1 started a discussion: (+2 comments) Could we simplify this further by always reading
primary_wal_location
?This sounds like it would simplify things a lot. Reading from the replica when scheduling a job, in case no write was done for that request, was just a mare optimization.
We wanted to further reduce pressure to the primary. Based on the data, we are querying the wal location from the primary 89.5% of the time
https://log.gprd.gitlab.net/goto/c6372af0-756e-11ed-9f43-e3784d7fe3ca
SELECT pg_current_wal_insert_lsn()::text AS location
is not an expensive query. So maybe it makes sense to just simply always query theprimary_write_location
. This will remove all problematic additional logic from SidekiqClientMiddleware, and will remove the need of keeping wal_location_source.@DylanGriffith, would like to hear your opinion about this.
There are several other issues with the current implementation that should be looked at:
- It was suggested that all decisions around how the WAL is provided should move into the
LoadBalancer
itself - It was suggested to remove the
USE_NEW_LOAD_BALANCER_QUERY
environment toggle that switches the query used to obtain the WAL - It was suggested to track
wal_location_source
on a per LB instance level since it can currently be incorrect (in the case where due to the lack of any replicas being available, we fall back to the primary WAL, but if there hadn't been any writes we think the source was a replica because this is determined based on session data, not what we actually did) - It was suggested to have separate LB session instances for each LB instance since there may have been writes to one database, but not the other