Postgres 12: observed performance regressions
Currently, the target database has the setup as we designed for production:
- PG12
- PG12 Setup
- OS Ubuntu 18
- Hardware AMD Epic Rome ( currently 128 vCPUs)
Our source cluster is similar to the current production.
We had the following results:
jfinotto@jmeter-01-inf-db-benchmarking.c.gitlab-db-benchmarking.internal:~/db-migration/benchmark/bin$ ./run-bench.sh -h pgbouncer.service.consul -d gitlabhq_production_pg12ute_source -U gitlab-superuser -p 6432 -e prd -t postgres-benchmark-bytime-final.jmx -j 30 -T 60 -r result0406_source_time_testf005.csv
Creating summariser <summary>
Created the tree successfully using plan/postgres-benchmark-bytime-final.jmx
Starting standalone test @ Tue Apr 06 16:53:05 UTC 2021 (1617727985939)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 4141 in 00:00:24 = 176.1/s Avg: 133 Min: 0 Max: 17734 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary + 5121 in 00:00:30 = 170.7/s Avg: 176 Min: 0 Max: 23040 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary = 9262 in 00:00:54 = 173.0/s Avg: 156 Min: 0 Max: 23040 Err: 0 (0.00%)
summary + 1266 in 00:00:16 = 77.2/s Avg: 336 Min: 0 Max: 28910 Err: 1 (0.08%) Active: 0 Started: 30 Finished: 30
summary = 10528 in 00:01:10 = 150.5/s Avg: 178 Min: 0 Max: 28910 Err: 1 (0.01%)
Tidying up ... @ Tue Apr 06 16:54:16 UTC 2021 (1617728056411)
... end of run
jfinotto@jmeter-01-inf-db-benchmarking.c.gitlab-db-benchmarking.internal:~/db-migration/benchmark/bin$ ./run-bench.sh -h pgbouncer.service.consul -d gitlabhq_production_pg12ute_target -U gitlab-superuser -p 6432 -e prd -t postgres-benchmark-bytime-final.jmx -j 30 -T 60 -r result0406_target_time_testf005.csv
Creating summariser <summary>
Created the tree successfully using plan/postgres-benchmark-bytime-final.jmx
Starting standalone test @ Tue Apr 06 16:54:37 UTC 2021 (1617728077815)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 2628 in 00:00:22 = 121.2/s Avg: 150 Min: 0 Max: 18850 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary + 1463 in 00:00:31 = 47.8/s Avg: 283 Min: 0 Max: 37592 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary = 4091 in 00:00:52 = 78.2/s Avg: 197 Min: 0 Max: 37592 Err: 0 (0.00%)
summary + 191 in 00:01:05 = 3.0/s Avg: 3160 Min: 0 Max: 95057 Err: 0 (0.00%) Active: 17 Started: 30 Finished: 13
summary = 4282 in 00:01:57 = 36.6/s Avg: 330 Min: 0 Max: 95057 Err: 0 (0.00%)
summary + 7 in 00:01:28 = 0.1/s Avg: 149608 Min: 93354 Max: 201242 Err: 0 (0.00%) Active: 10 Started: 30 Finished: 20
summary = 4289 in 00:03:25 = 20.9/s Avg: 573 Min: 0 Max: 201242 Err: 0 (0.00%)
summary + 5 in 00:00:27 = 0.2/s Avg: 191537 Min: 163211 Max: 231412 Err: 0 (0.00%) Active: 5 Started: 30 Finished: 25
summary = 4294 in 00:03:52 = 18.5/s Avg: 796 Min: 0 Max: 231412 Err: 0 (0.00%)
summary + 3 in 00:01:05 = 0.0/s Avg: 276280 Min: 265850 Max: 286316 Err: 0 (0.00%) Active: 2 Started: 30 Finished: 28
summary = 4297 in 00:04:57 = 14.5/s Avg: 988 Min: 0 Max: 286316 Err: 0 (0.00%)
result0406_target_time_testf005.csvsummary + 1 in 00:02:49 = 0.0/s Avg: 454381 Min: 454381 Max: 454381 Err: 0 (0.00%) Active: 0 Started: 30 Finished: 30
summary = 4298 in 00:07:47 = 9.2/s Avg: 1093 Min: 0 Max: 454381 Err: 0 (0.00%)
Tidying up ... @ Tue Apr 06 17:02:24 UTC 2021 (1617728544968)
... end of run
In target took around 8 min the test designed for 1 min due to possible performance regression.
Another test was executed with the statement_timeout with 15 seconds (as it is in production).
jfinotto@jmeter-01-inf-db-benchmarking.c.gitlab-db-benchmarking.internal:~/db-migration/benchmark/bin$ ./run-bench.sh -h pgbouncer.service.consul -d gitlabhq_production_pg12ute_target -U gitlab-superuser -p 6432 -e prd -t postgres-benchmark-bytime-final.jmx -j 30 -T 60 -r result0406_target_time_testf004.csv
Creating summariser <summary>
Created the tree successfully using plan/postgres-benchmark-bytime-final.jmx
Starting standalone test @ Tue Apr 06 16:44:50 UTC 2021 (1617727490488)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 1407 in 00:00:09 = 157.4/s Avg: 112 Min: 0 Max: 6204 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary + 2858 in 00:00:30 = 94.6/s Avg: 304 Min: 0 Max: 15607 Err: 29 (1.01%) Active: 30 Started: 30 Finished: 0
summary = 4265 in 00:00:39 = 108.9/s Avg: 240 Min: 0 Max: 15607 Err: 29 (0.68%)
cacsummary + 1350 in 00:00:30 = 44.7/s Avg: 614 Min: 0 Max: 15852 Err: 30 (2.22%) Active: 7 Started: 30 Finished: 23
summary = 5615 in 00:01:09 = 81.0/s Avg: 330 Min: 0 Max: 15852 Err: 59 (1.05%)
summary + 6 in 00:00:06 = 1.1/s Avg: 15337 Min: 15001 Max: 15836 Err: 6 (100.00%) Active: 0 Started: 30 Finished: 30
summary = 5621 in 00:01:15 = 75.0/s Avg: 346 Min: 0 Max: 15852 Err: 65 (1.16%)
Tidying up ... @ Tue Apr 06 16:46:06 UTC 2021 (1617727566048)
... end of run
jfinotto@jmeter-01-inf-db-benchmarking.c.gitlab-db-benchmarking.internal:~/db-migration/benchmark/bin$ ./run-bench.sh -h pgbouncer.service.consul -d gitlabhq_production_pg12ute_source -U gitlab-superuser -p 6432 -e prd -t postgres-benchmark-bytime-final.jmx -j 30 -T 60 -r result0406_source_time_testf004.csv
Creating summariser <summary>
Created the tree successfully using plan/postgres-benchmark-bytime-final.jmx
Starting standalone test @ Tue Apr 06 16:46:47 UTC 2021 (1617727607006)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary + 2379 in 00:00:12 = 191.3/s Avg: 110 Min: 0 Max: 9099 Err: 0 (0.00%) Active: 30 Started: 30 Finished: 0
summary + 5134 in 00:00:30 = 171.5/s Avg: 166 Min: 0 Max: 15729 Err: 17 (0.33%) Active: 30 Started: 30 Finished: 0
summary = 7513 in 00:00:42 = 177.3/s Avg: 148 Min: 0 Max: 15729 Err: 17 (0.23%)
summary + 3274 in 00:00:29 = 112.6/s Avg: 230 Min: 0 Max: 15354 Err: 18 (0.55%) Active: 0 Started: 30 Finished: 30
summary = 10787 in 00:01:11 = 151.0/s Avg: 173 Min: 0 Max: 15729 Err: 35 (0.32%)
Tidying up ... @ Tue Apr 06 16:47:59 UTC 2021 (1617727679074)
... end of run
The error rate is higher on the target based on timeout disconnetion.
I will add more details and the reports.
The repo used with the info from the queries and the workloads is the following:
https://gitlab.com/gitlab-com/gl-infra/db-migration/-/tree/master/benchmark
Edited by Gerardo Lopez-Fernandez