Allow Sidekiq jobs to use readonly database replicas
Currently Sidekiq
always use primary
, but not always needs. This means that all of Sidekiq's database traffic will hit the primary, whereas only some web database traffic will hit the primary. From https://dashboards.gitlab.net/d/000000144/postgresql-overview, we can see that none of the replicas are used as much as the primary, but all of them are in the same ballpark.
Overall, our metrics suggest that we spend more database time in web transactions (green line) than Sidekiq jobs (orange line), but Sidekiq is still a significant percentage:
We currently have no way to distinguish whether the given worker
requires read-only or read-write access to data. It seems that
if we would start annotating workers, we could call for majority
of time Replicas instead, for operations that do not require read-write
and super up-to date data, like:
- all notifications
- all webhooks
- all ...
This would allow us to remove a number of SELECT
statements from master
.
groupscalability in GitLab.org / GitLab is spending a lot of effort of annotating workers, maybe following the same pattern we could do the same.