Investigate ways to increase CI tunnel performance
From https://staging.gitlab.com/twatson-test-group/agent-configuration-test/-/jobs/39357308:
$ time kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-78fcd69978-mn8z7 1/1 Running 5 (4m11s ago) 5d6h
kube-system coredns-78fcd69978-q7mhh 1/1 Running 5 (4m11s ago) 5d6h
kube-system etcd-kind-control-plane 1/1 Running 5 (4m11s ago) 5d6h
kube-system kindnet-2pxgr 1/1 Running 5 (4m11s ago) 5d6h
kube-system kube-apiserver-kind-control-plane 1/1 Running 5 (4m11s ago) 5d6h
kube-system kube-controller-manager-kind-control-plane 1/1 Running 5 (4m11s ago) 5d6h
kube-system kube-proxy-8ptm2 1/1 Running 5 (4m11s ago) 5d6h
kube-system kube-scheduler-kind-control-plane 1/1 Running 5 (4m11s ago) 5d6h
local-path-storage local-path-provisioner-85494db59d-4wk78 1/1 Running 8 (4m11s ago) 5d6h
real 0m18.339s
user 0m0.116s
sys 0m0.028s
Of course some of the above is just kas<->agent latency (US<->Australia) in this particular test, but still. Locally I'm getting:
kubectl get pods --all-namespaces 0.10s user 0.13s system 16% cpu 1.350 total
Ideas:
-
Follow up for https://github.com/kubernetes/kubernetes/pull/103900 to cache discovery info in memory. That would be THE best thing for latency for containers with read-only temporary directory. Follow up: https://github.com/kubernetes/kubernetes/pull/105723 -
Reduce Redis polling interval to poll more often for new agent tunnels. !512 (merged) -
More robust request routing (#168 - closed) -
Scale number of agent-kas connections based on ... (#247 - closed) -
Run ExpiringHash GC asynchronously (!703 - merged) -
Run ExpiringHash refresh asynchronously - Optimize reverse tunnel routing (!723 - merged) -
Only refresh entries that would expire otherwise (!722 - merged) -
Publish pub-sub events when tunnel connects (gitlab-org/gitlab#320732 - closed) and Consume pub-sub events to reduce tunnel lookup ... (gitlab-org/gitlab#323131 - closed) to reduce/eliminate Redis lookup latency.Done Make kas prefer routing through self vs other i... (!788 - merged) instead. -
Prefer self when kas-kas routing a Kubernetes r... (#251 - closed) -
Batch tunnel/agent registration Redis writesWe've made them concurrent instead - Optimize reverse tunnel routing (!723 - merged) -
Don't unmarshal whole ExpiringValue
objects in GC, only unmarshal the timestamp. Make timestamps an integer (unix timestamp in seconds) to avoid nanoseconds (waste of space). - Optimize expiring hash (!733 - merged) -
Use []byte
/bytes
instead ofanypb.Any
for value inExpiringValue
.Any
adds type string, which is of non-negligible size compared to the actual payload. - Optimize expiring hash (!733 - merged) -
... more ideas?
p.s. discovery is already concurrent.
Edited by Mikhail Mazurskiy