Handle outdated replicas in the DB load balancer

Review changes
Open in Workspace
Download
Patches
Plain diff

load-balancing-handle-outdated-replicas into master Nov 22, 2017

Overview 11
Commits 1
Pipelines 0
Reports 3
Changes 8

This MR extends the database load balancer so it can handle replicas that are lagging behind too much. See the commit message for more details. In short:

If a replica lags behind more than 60 seconds (by default), and it lags behind more than 8 MB we stop reading from that secondary. We check this roughly every 30 seconds. There is no central state, instead everything is done in memory. A replica is used again once it is in sync again.

TODO

~~Store the "online" state and timestamp in Redis so multiple processes won't perform the same work~~
- Skipping this for now as performing this work in-memory is much easier.
~~Prevent a thundering herd when all replicas are offline by gradually redirecting traffic~~
- Skipping this since it's not something I consider very useful. If a primary can't handle all traffic then gradually redirect traffic won't work as the primary will eventually succumb anyway. This would give the false sense of belief the primary wouldn't go down.
- https://gitlab.com/gitlab-com/infrastructure/issues/2480 might be a much better solution combined with this MR
- This also requires some form of central coordination, which adds a lot of complexity.
Randomly adjust the check interval per host per request to reduce the likelihood of all processes checking at once
Test using a real replica

Does this MR meet the acceptance criteria?

Changelog entry added, if necessary
Documentation created/updated
Tests added for this feature/bug
Review
- Has been reviewed by Backend
- Has been reviewed by Database
Conform by the merge request performance guides
Conform by the style guides
Squashed related commits together

What are the relevant issue numbers?

https://gitlab.com/gitlab-org/gitlab-ee/issues/2197

Edited Nov 30, 2017 by Yorick Peterse

Merge request reports

Assignee Loading

Reviewers Loading

Request review from

Loading

Time tracking Loading

Loading