Evaluate Odyssey as a replacement for pgbouncer(s)
In light of the recent issues related to pgbouncer's single-threaded core saturation (like in https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7632), some measures have been taken to mitigate or prevent excessive load:
- Creating more replicas (and hence distributing the load across more servers and, consequently, pgbouncers). See: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7674
- Adding more than one pgbouncer per PostgreSQL host and expose them as separate services, so as to have effectively multi-process load balancing via DNS. See: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7651
Obviously, none are ideal, even we bought some time. Ideally, we should use a multi-threaded load balancer. Enter Odyssey. It is heavily used at high-profile sites like Yandex, and while at version 1.0rc1
, reports from the main author claim for its stability. We should test it thoroughly and evaluate if it may become a replacement for the multiple pgbouncers per host.
Proposed plan:
-
Wait until GA release, right now the latest version is an RC => https://github.com/yandex/odyssey/releases/latest -
Determine a relevant workload and mechanism to benchmark GitLab. This might become a separate issue on its own. We really need to have a way to measure performance, reproducibly, that resembles production workload (unless anything like this already exists). -
Measure Odyssey performance and compare with saturated pgbouncer. -
Stress-test Odyssey and look for potential crashes and/or memory leaks. -
Create a final report summarizing the findings.
cc @Finotto