We are taking again on this capacity planning effort. The main reason for that is the DB Sharding conversations: We are discussing how to shard our main DB cluster in production, which could take some time to lead to agreements, implementation and roll out to production. Meanwhile, we'd like to be sure that our DB will not hit a wall any time soon, impacting gitlab.com.
Some relevant information we'd want to take into account (some of it comes from the above issue):
We know there is still some vertical growth that we could go for on this cluster - particularly on our leader node (read/write), involved in a lot of the operations that eventually go to our replicas too. More and more performant CPUs, more memory... There is still some headroom.
Questions we wanted to answer at that time (3 months ago):
How much more load we could add on the primary database without degradation in the performance.
We are reaching 60k connections per second. How much could we grow, safely?
Should we increase disk or memory to optimize performance? Any suggestions in shared buffers setup?
Estimate how much % of the load is related to the table merge_requests.
measure the hot set of data that we use at the moment in memory. How often are we writing to disk? (pg_buffercache and the LRU count)
evaluate architectures with higher number of vCPUS.
There has been a benchmarking effort somehow related, lead by Ongres, here. Its goal was to ascertain whether our DB would be as healthy once migrated to v11.7 (changed carried out here)
This could be a rough idea/initial approach to this work:
Rationale: As we increase the query load (req/s) on this leader server, the response times will start increasing progressively, bu the service times will continue to be similar for a good while, till they eventually increase dramatically.
I know there is a lot of work already done by Ongres in the benchmarking issue that we can reuse here.
@albertoramos I think we should consider adding pgbouncers in the topology to test, between the Jmeter and the PostgreSQL database. It will reflect more realistic the production environment.
The other main concern we just spoke about is getting the right data (as close to production as possible) for this Capacity assessment... Which will take some time (compliance/security concerns)
Alberto Ramoschanged title from Capacity planning of our main production DB cluster to Capacity assessment for our main production DB cluster
changed title from Capacity planning of our main production DB cluster to Capacity assessment for our main production DB cluster
@sytses thank you for the ping. Yes we should align on the tooling.
@albertoramos for context, Quality has evaluated and used multiple performance tools (Jmeter,Artillery,K6) We currently are using K6 for our performance tests and also validating our reference architecture. In addition we are building load testing feature into GitLab that will provide K6 capabilities out of the box. Can we please kindly evaluate using K6 in the infrastructure workstream here?
As an addition to @Finotto's analysis https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10340, I've quickly checked what the options of vertical scaling are on GCP. Currently, PostgreSQL machines are n1-highmem-96. Further, I analyze only the primary node since it's the most difficult to scale.
The data is very, very well cached, the effectiveness of the buffer pool (which can be easily increased in size, because now it's just 110 GiB) is >99%. The main limiting factor (as analyzed in details in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10340) is CPU.
There are options to do vertical scaling on GCP, namely. For n1-*** family (which is based on Intel Skylake/Broadwell/Haswell), the most powerful instance is already used, with 96 vCPUs. But GCP also provides n2d-*** and m2-**.
To ensure that for Postgres workloads we should expect some gain if we switch to AMD EPYC Rome and 224 vCPUs, I've performed a couple of small benchmarks with standard pgbench utility, comparing n1 with 96 vCPUs (like on the current master) to n2d with 224 vCPUs (interesting, that all of them are available in "preemptible" mode, with prices like $1-2 per hour, very convenient options for benchmarks).
The result is (SELECT-only workload, which is CPU-bound; the details are below):
n1 with 96 vCPUs: 0.63M TPS
n2d with 224 vCPUs: 1.04M TPS.
This means that indeed, we could have more CPU power to perform vertical scaling if/when needed, and handle much higher workload on n2d compared to the current n1.
m2 instances are even more powerful, they advertise up to 416 vCPUs (new Intel), 11,776 GiB of RAM (announcement in GCP blog: https://cloud.google.com/blog/products/compute/introducing-compute-and-memory-optimized-vms-for-google-compute-engine). Unfortunately, it looks like it's not directly available in GCP console in any region, so I couldn't test it. But based on the blog post, this is a lot of CPU power, and current Postgres versions scale quite well for large amounts of CPU cores (unlike a few years ago), so if needed, these options can help handle the increasing workload and buy some more time if needed. With 416 vCPUs, we could expect to handle 3-4 more TPS than now without degradation. Of course, if this path seems important, it makes sense to check it thoroughly.
Below are the tech details of the benchmarks performed.
Scale and config adjustments:
100000000 of 100000000 tuples (100%) done (elapsed 104.98 s, remaining 0.00 s)alter system set max_connections = 3000;alter system set shared_buffers = '150GB';alter system set effective_cache_size = '200GB';alter system set max_wal_size ='100GB';alter system set checkpoint_timeout = '100min';alter system set work_mem = '100MB';
Detailed results for both read-only and default pgbench's r/w workloads:
# n2d with 224 vCPUs$ pgbench -n -P30 -T120 -r -S -j224 -c224progress: 30.0 s, 871030.5 tps, lat 0.253 ms stddev 0.383progress: 60.0 s, 1096054.3 tps, lat 0.204 ms stddev 0.113progress: 90.0 s, 1104329.0 tps, lat 0.203 ms stddev 0.077progress: 120.0 s, 1105450.1 tps, lat 0.203 ms stddev 0.075transaction type: <builtin: select only>scaling factor: 1000query mode: simplenumber of clients: 224number of threads: 224duration: 120 snumber of transactions actually processed: 125306168latency average = 0.214 mslatency stddev = 0.193 mstps = 1041235.609955 (including connections establishing)tps = 1044645.775367 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.229 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;$ pgbench -n -P30 -T120 -r -j224 -c224progress: 30.0 s, 12720.9 tps, lat 17.568 ms stddev 13.966progress: 60.0 s, 16389.5 tps, lat 13.668 ms stddev 12.058progress: 90.0 s, 20807.6 tps, lat 10.764 ms stddev 10.643progress: 120.0 s, 26994.3 tps, lat 8.295 ms stddev 8.056transaction type: <builtin: TPC-B (sort of)>scaling factor: 1000query mode: simplenumber of clients: 224number of threads: 224duration: 120 snumber of transactions actually processed: 2307595latency average = 11.645 mslatency stddev = 11.296 mstps = 19219.693787 (including connections establishing)tps = 19223.221581 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.004 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.149 BEGIN; 0.511 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.200 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 0.418 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 1.322 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.218 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 8.833 END;# n1 with 96 vCPUspostgres@nik-pgbench-n1-tmp:~$ pgbench -n -P30 -T120 -r -j224 -c224 -Sprogress: 30.0 s, 525707.1 tps, lat 0.425 ms stddev 0.176progress: 60.0 s, 540858.6 tps, lat 0.414 ms stddev 0.112progress: 90.0 s, 545726.0 tps, lat 0.410 ms stddev 0.111progress: 120.0 s, 543147.8 tps, lat 0.412 ms stddev 0.111transaction type: <builtin: select only>scaling factor: 1000query mode: simplenumber of clients: 224number of threads: 224duration: 120 snumber of transactions actually processed: 64663454latency average = 0.415 mslatency stddev = 0.130 mstps = 538645.227077 (including connections establishing)tps = 538939.256459 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.001 \set aid random(1, 100000 * :scale) 0.420 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;postgres@nik-pgbench-n1-tmp:~$postgres@nik-pgbench-n1-tmp:~$ pgbench -n -P30 -T120 -r -j224 -c224progress: 30.0 s, 38775.7 tps, lat 5.553 ms stddev 7.192progress: 60.0 s, 50323.5 tps, lat 4.451 ms stddev 4.137progress: 90.0 s, 50674.2 tps, lat 4.420 ms stddev 4.547progress: 120.0 s, 51195.9 tps, lat 4.375 ms stddev 3.713transaction type: <builtin: TPC-B (sort of)>scaling factor: 1000query mode: simplenumber of clients: 224number of threads: 224duration: 120 snumber of transactions actually processed: 5729300latency average = 4.647 mslatency stddev = 4.941 mstps = 47721.160314 (including connections establishing)tps = 48029.317259 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.108 BEGIN; 0.270 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.194 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 0.253 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 0.549 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.178 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 3.092 END;postgres@nik-pgbench-n1-tmp:~$postgres@nik-pgbench-n1-tmp:~$postgres@nik-pgbench-n1-tmp:~$ pgbench -n -P30 -T120 -r -j96 -c96 -Sprogress: 30.0 s, 578306.3 tps, lat 0.164 ms stddev 0.074progress: 60.0 s, 630212.3 tps, lat 0.152 ms stddev 0.052progress: 90.0 s, 653320.6 tps, lat 0.147 ms stddev 0.051progress: 120.0 s, 657760.7 tps, lat 0.146 ms stddev 0.051transaction type: <builtin: select only>scaling factor: 1000query mode: simplenumber of clients: 96number of threads: 96duration: 120 snumber of transactions actually processed: 75588114latency average = 0.152 mslatency stddev = 0.058 mstps = 629179.944865 (including connections establishing)tps = 630381.217198 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.001 \set aid random(1, 100000 * :scale) 0.154 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;postgres@nik-pgbench-n1-tmp:~$postgres@nik-pgbench-n1-tmp:~$ pgbench -n -P30 -T120 -r -j96 -c96progress: 30.0 s, 36004.5 tps, lat 2.653 ms stddev 2.218progress: 60.0 s, 35011.6 tps, lat 2.742 ms stddev 2.324progress: 90.0 s, 36708.7 tps, lat 2.615 ms stddev 2.314progress: 120.0 s, 36217.3 tps, lat 2.648 ms stddev 2.389transaction type: <builtin: TPC-B (sort of)>scaling factor: 1000query mode: simplenumber of clients: 96number of threads: 96duration: 120 snumber of transactions actually processed: 4318364latency average = 2.665 mslatency stddev = 2.322 mstps = 35975.841646 (including connections establishing)tps = 36009.209318 (excluding connections establishing)script statistics: - statement latencies in milliseconds: 0.002 \set aid random(1, 100000 * :scale) 0.001 \set bid random(1, 1 * :scale) 0.001 \set tid random(1, 10 * :scale) 0.001 \set delta random(-5000, 5000) 0.073 BEGIN; 0.199 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; 0.140 SELECT abalance FROM pgbench_accounts WHERE aid = :aid; 0.156 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid; 0.217 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid; 0.118 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP); 1.758 END;
@NikolayS The N2's are problematic because in us-east1 they are only available in zones c and d, not us-east1-b. Our patroni servers are spread through those three zones in a round-robin fashion (01 => c, 02 => d, 03 => b, 04 => c, and so on); if we wanted to migrate to N2s we'd need to change our terraform config to avoid b (the simplest option being to put all the patroni nodes in one zone, which carries the obvious risks for availability). Rebuilds would be required at a minimum for the ones in b. To be clear, it's not impossible or even terribly difficult, it's just not as simple as shutting down a node, changing the machine type, and starting up again.
Sadly the n2d's and m2's are not available in any zone in us-east1.