Fine tune limits, requests, replicas, and puma settings for git/websockets traffic in kubernetes
This issue will be to discuss how we will set these for both websocket and git traffic, when we move production workloads to the Kubernetes cluster.
Resource configuration
For workhorse we are setting the default:
resources:
requests:
cpu: 100m
memory: 100M
For puma we are setting:
resources:
limits:
cpu: 1.5
memory: 2G
requests:
cpu: 300m
memory: 1.5G
minReplicas: 2
maxReplicas: 10
And for puma maxmemory/threads:
puma:
workerMaxMemory: 1342 # in MB units
threads:
min: 1
max: 4
Request rates in production
Rails
Log query: https://log.gprd.gitlab.net/goto/338024a5e3564d4fc1c86c84dccd9e9b
- Currently, in production there are 16
custom-16-20486VMs servicing git-ssh, git-https, and websocket traffic. - Each VM is configured for 16 puma workers with up to 4 threads
- Peak traffic for git https traffic is at ~12:00 UTC where we see up to
2,400 requests / minuteon a single VM, or40 RPSfor git requests to rails, the majority of which are info refs https://log.gprd.gitlab.net/goto/b25ce0227641f488fe0732c49caf478a . Divided by 16 workers that means each worker is processing~3 req/secper worker. - For latency: https://log.gprd.gitlab.net/goto/19a7d4d88654bd46c6af9b6c9c2813f4
- 99th percentile: ~0.75s
- 50th percentile: ~0.07s
Workhorse
Log query: https://log.gprd.gitlab.net/goto/bb7f4db5d80a08841f1b2e942e920908
- For workhorse latency: https://log.gprd.gitlab.net/goto/db0b77a82e5a91a9c98e78f361025239
- 99th percentile: ~1.5s
- 50th percentile: ~.10s
Puma
- We occasionally see queued connections to puma indicating that occasionally we do not have an available worker, this likely increases our 99th percentile
- Other metrics to look at are the pool capacity and idle threads
Staging results
1 VM, 16 workers
$ bombardier --header="Host: staging.gitlab.com" -l -d600s --http2 -r60 https://staging.gitlab.com/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack
Bombarding https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack for 10m0s using 125 connection(s)
[==============================================================================================================] 10m0s
Done!
Statistics Avg Stdev Max
Reqs/sec 60.00 24.57 299.18
Latency 240.24ms 57.58ms 1.80s
Latency Distribution
50% 209.22ms
75% 291.00ms
90% 304.03ms
95% 317.41ms
99% 400.81ms
HTTP codes:
1xx - 0, 2xx - 35541, 3xx - 0, 4xx - 0, 5xx - 456
others - 6
Errors:
Get https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack: http2: Transport: peer server initiated graceful shutdown after some of Request.Body was written; define Request.GetBody to avoid this error - 6
stream error: stream ID 10655; INTERNAL_ERROR - 1
stream error: stream ID 10651; INTERNAL_ERROR - 1
stream error: stream ID 10653; INTERNAL_ERROR - 1
Throughput: 13.84MB/s
- Workhorse latency: https://nonprod-log.gitlab.net/goto/6d4de876326cdba2f0f83f13b206eae0
- 99th percentile: .2s
- 50th percentile: .11s
8 pods x 2 workers per pod
2020-08-11: 14:25-14:35
$ bombardier --header="Host: staging.gitlab.com" -l -d600s --http2 -r60 https://staging.gitlab.com/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack
Bombarding https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack for 10m0s using 125 connection(s)
[==============================================================================================================] 10m0s
Done!
Statistics Avg Stdev Max
Reqs/sec 59.99 20.55 288.59
Latency 195.31ms 31.07ms 1.40s
Latency Distribution
50% 187.89ms
75% 196.82ms
90% 213.87ms
95% 246.52ms
99% 322.87ms
HTTP codes:
1xx - 0, 2xx - 35997, 3xx - 0, 4xx - 0, 5xx - 0
others - 6
Errors:
Get https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack: http2: Transport: peer server initiated graceful shutdown after some of Request.Body was written; define Request.GetBody to avoid this error - 6
Throughput: 14.02MB/s
- Workhorse latency: https://nonprod-log.gitlab.net/goto/be18b2d88309cfdbe5fda988d8f1c717
- 99th percentile: ~.2s
- 50th percentile: ~.09s
- CPU utilization
-
Memory
- 1.5 - 2GB
16 pods x 1 workers per pod
$ date -u; bombardier --header="Host: staging.gitlab.com" -l -d600s --http2 -r60 https://staging.gitlab.com/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack; date -u
Bombarding https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack for 10m0s using 125 connection(s)
[==============================================================================================================] 10m0s
Done!
Statistics Avg Stdev Max
Reqs/sec 59.99 23.20 282.96
Latency 199.24ms 35.55ms 1.12s
Latency Distribution
50% 187.21ms
75% 203.01ms
90% 230.40ms
95% 276.60ms
99% 348.42ms
HTTP codes:
1xx - 0, 2xx - 35997, 3xx - 0, 4xx - 0, 5xx - 0
others - 6
Errors:
Get https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack: http2: Transport: peer server initiated graceful shutdown after some of Request.Body was written; define Request.GetBody to avoid this error - 6
Throughput: 14.02MB/s
- Workhorse latency: https://nonprod-log.gitlab.net/goto/c15ab8943f060c53ed02bd75380dc0b5
- 99th percentile: ~.2s
- 50th percentile: ~.1s
-
CPU utilization
- ~.2 cores per container
-
Memory
- ~1.5
4 pods x 4 workers per pod
$ date -u; bombardier --header="Host: staging.gitlab.com" -l -d600s --http2 -r60 https://staging.gitlab.com/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack; date -u
Thu Aug 13 14:57:50 UTC 2020
Bombarding https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack for 10m0s using 125 connection(s)
[==============================================================================================================] 10m0s
Done!
Statistics Avg Stdev Max
Reqs/sec 59.99 21.80 268.34
Latency 203.34ms 47.21ms 1.33s
Latency Distribution
50% 188.76ms
75% 204.16ms
90% 239.92ms
95% 292.96ms
99% 403.89ms
HTTP codes:
1xx - 0, 2xx - 35997, 3xx - 0, 4xx - 0, 5xx - 0
others - 6
Errors:
Get https://staging.gitlab.com:443/gitlab-org/gitlab-ee.git/info/refs?service=git-upload-pack: http2: Transport: peer server initiated graceful shutdown after some of Request.Body was written; define Request.GetBody to avoid this error - 6
Throughput: 14.02MB/s
Thu Aug 13 15:07:51 UTC 2020
- Workhorse latency: https://nonprod-log.gitlab.net/goto/8cbefbb1f168d435f3ceba057cf3ba9e
- 99th percentile: ~.2s
- 50th percentile: ~.1s
-
CPU utilization
- ~.8 cores per container
-
Memory
- ~2.5
Edited by John Jarvis


