Understand scalability of frontend-side Kuberentes Resource fetch
Problem
Currently, Initial iteration of Kubernetes Dashboard in En... (&9859 - closed) is proposing to have frontend clients to directly fetch Kubernetes resources. This is basically relying on user_access
keyword that KAS authenticate/authorize the requester and proxy the Kuberentes API requests to agentk. For example, you see the following requests by kubectl get all -v=8 -n flux-system
:
I0223 15:22:19.647328 41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/pods?limit=500
I0223 15:22:20.277198 41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/replicationcontrollers?limit=500
I0223 15:22:20.798156 41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/services?limit=500
I0223 15:22:21.420801 41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/daemonsets?limit=500
I0223 15:22:21.914102 41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/deployments?limit=500
I0223 15:22:22.541027 41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/replicasets?limit=500
I0223 15:22:23.194333 41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/statefulsets?limit=500
I0223 15:22:23.734142 41395 round_trippers.go:432] GET https://<ip>/apis/autoscaling/v2/namespaces/flux-system/horizontalpodautoscalers?limit=500
I0223 15:22:24.233590 41395 round_trippers.go:432] GET https://<ip>/apis/batch/v1/namespaces/flux-system/cronjobs?limit=500
I0223 15:22:24.729936 41395 round_trippers.go:432] GET https://<ip>/apis/batch/v1/namespaces/flux-system/jobs?limit=500
In Render Kubernetes resources in Environment inde... (#390769 - closed), we're trying to fetch all of the resource kinds. Available resource kinds vary per Kuberentes setup. In my case, roughly 100 kinds avaialble, including default resource kind + CRD in a cluster:
API resource list
shinya@shinya-B550-VISION-D:~/workspace/thin-gdk$ k api-resources
NAME SHORTNAMES APIVERSION NAMESPACED KIND
bindings v1 true Binding
componentstatuses cs v1 false ComponentStatus
configmaps cm v1 true ConfigMap
endpoints ep v1 true Endpoints
events ev v1 true Event
limitranges limits v1 true LimitRange
namespaces ns v1 false Namespace
nodes no v1 false Node
persistentvolumeclaims pvc v1 true PersistentVolumeClaim
persistentvolumes pv v1 false PersistentVolume
pods po v1 true Pod
podtemplates v1 true PodTemplate
replicationcontrollers rc v1 true ReplicationController
resourcequotas quota v1 true ResourceQuota
secrets v1 true Secret
serviceaccounts sa v1 true ServiceAccount
services svc v1 true Service
mutatingwebhookconfigurations admissionregistration.k8s.io/v1 false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io/v1 false CustomResourceDefinition
apiservices apiregistration.k8s.io/v1 false APIService
controllerrevisions apps/v1 true ControllerRevision
daemonsets ds apps/v1 true DaemonSet
deployments deploy apps/v1 true Deployment
replicasets rs apps/v1 true ReplicaSet
statefulsets sts apps/v1 true StatefulSet
tokenreviews authentication.k8s.io/v1 false TokenReview
localsubjectaccessreviews authorization.k8s.io/v1 true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io/v1 false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io/v1 false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io/v1 false SubjectAccessReview
allowlistedworkloads auto.gke.io/v1 false AllowlistedWorkload
horizontalpodautoscalers hpa autoscaling/v2 true HorizontalPodAutoscaler
multidimpodautoscalers mpa autoscaling.gke.io/v1beta1 true MultidimPodAutoscaler
verticalpodautoscalers vpa autoscaling.k8s.io/v1 true VerticalPodAutoscaler
cronjobs cj batch/v1 true CronJob
jobs batch/v1 true Job
certificatesigningrequests csr certificates.k8s.io/v1 false CertificateSigningRequest
ciliumendpoints cep,ciliumep cilium.io/v2 true CiliumEndpoint
ciliumendpointslices ces cilium.io/v2alpha1 false CiliumEndpointSlice
ciliumexternalworkloads cew cilium.io/v2 false CiliumExternalWorkload
ciliumidentities ciliumid cilium.io/v2 false CiliumIdentity
ciliumlocalredirectpolicies clrp cilium.io/v2 true CiliumLocalRedirectPolicy
ciliumnodes cn,ciliumn cilium.io/v2 false CiliumNode
backendconfigs bc cloud.google.com/v1 true BackendConfig
containerwatcherstatuses containerthreatdetection.googleapis.com/v1 true ContainerWatcherStatus
leases coordination.k8s.io/v1 true Lease
endpointslices discovery.k8s.io/v1 true EndpointSlice
events ev events.k8s.io/v1 true Event
flowschemas flowcontrol.apiserver.k8s.io/v1beta2 false FlowSchema
prioritylevelconfigurations flowcontrol.apiserver.k8s.io/v1beta2 false PriorityLevelConfiguration
helmreleases hr helm.toolkit.fluxcd.io/v2beta1 true HelmRelease
memberships hub.gke.io/v1 false Membership
capacityrequests capreq internal.autoscaling.gke.io/v1alpha1 true CapacityRequest
kustomizations ks kustomize.toolkit.fluxcd.io/v1beta2 true Kustomization
nodes metrics.k8s.io/v1beta1 false NodeMetrics
pods metrics.k8s.io/v1beta1 true PodMetrics
dataplanev2encryption dpv2e networking.gke.io/v1alpha1 false DataplaneV2Encryption
egressnatpolicies networking.gke.io/v1 false EgressNATPolicy
frontendconfigs networking.gke.io/v1beta1 true FrontendConfig
managedcertificates mcrt networking.gke.io/v1 true ManagedCertificate
networkloggings nl networking.gke.io/v1alpha1 false NetworkLogging
redirectservices rds networking.gke.io/v1alpha1 true RedirectService
remotenodes rn networking.gke.io/v1alpha1 false RemoteNode
serviceattachments networking.gke.io/v1 true ServiceAttachment
servicenetworkendpointgroups svcneg networking.gke.io/v1beta1 true ServiceNetworkEndpointGroup
ingressclasses networking.k8s.io/v1 false IngressClass
ingresses ing networking.k8s.io/v1 true Ingress
networkpolicies netpol networking.k8s.io/v1 true NetworkPolicy
runtimeclasses node.k8s.io/v1 false RuntimeClass
updateinfos updinf nodemanagement.gke.io/v1alpha1 true UpdateInfo
alerts notification.toolkit.fluxcd.io/v1beta2 true Alert
providers notification.toolkit.fluxcd.io/v1beta2 true Provider
receivers notification.toolkit.fluxcd.io/v1beta2 true Receiver
poddisruptionbudgets pdb policy/v1 true PodDisruptionBudget
podsecuritypolicies psp policy/v1beta1 false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io/v1 false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io/v1 false ClusterRole
rolebindings rbac.authorization.k8s.io/v1 true RoleBinding
roles rbac.authorization.k8s.io/v1 true Role
priorityclasses pc scheduling.k8s.io/v1 false PriorityClass
volumesnapshotclasses snapshot.storage.k8s.io/v1 false VolumeSnapshotClass
volumesnapshotcontents snapshot.storage.k8s.io/v1 false VolumeSnapshotContent
volumesnapshots snapshot.storage.k8s.io/v1 true VolumeSnapshot
buckets source.toolkit.fluxcd.io/v1beta2 true Bucket
gitrepositories gitrepo source.toolkit.fluxcd.io/v1beta2 true GitRepository
helmcharts hc source.toolkit.fluxcd.io/v1beta2 true HelmChart
helmrepositories helmrepo source.toolkit.fluxcd.io/v1beta2 true HelmRepository
ocirepositories ocirepo source.toolkit.fluxcd.io/v1beta2 true OCIRepository
csidrivers storage.k8s.io/v1 false CSIDriver
csinodes storage.k8s.io/v1 false CSINode
csistoragecapacities storage.k8s.io/v1 true CSIStorageCapacity
storageclasses sc storage.k8s.io/v1 false StorageClass
volumeattachments storage.k8s.io/v1 false VolumeAttachment
shinya@shinya-B550-VISION-D:~/workspace/thin-gdk$ k api-resources | wc -l
95
Provided that we implement the current plan as-is, we can roughly estimate the scalability of this feature:
- 100 kinds of API resources in the cluster. => 100 requests are executed per frontend client and environment.
- Resource refresh interval (Frontend polling interval) is 10 sec. => 600 requests per min (= 100 * 6).
- 10 different users visit the same environment index page and open frontend-production and backend-production environments. => 12000 requests per min (= 600 * 10 * 2)
We should make sure if this is acceptable for user's Kuberentes API server. One of the concerns is if it could hit API rate limit 429 Too Many Requests, it'll disturb the other important operations such as GitOps/CI-Access.
Open discussions
Past discussions
- Question: Which resource kind is necessary for rendering dashboard?
- Question: Should GitLab-Environment fetch resources instead of individual users fetching resources?
Summary
- Frontend use Watch API instead of polling. For more details, See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes. and example using the KAS User Access feature here #393002 (comment 1291133940)
- Bump the 100 limit or remove it completely on agetnk.
- Use Feature Flags in case it causes a trouble in user's cluster.
- Document that users need to scale up their clusters as users are more actively using resource watches.
Roughly estimating,
- X users are watching a frontend production environment = X * 15 watches
- Y users are watching a frontend resource in kubernetes dashboard = Y * 1 watch
- Z operations are running (CI-Job/GitOps) = Z * 10 API requests (let's say 10 requests needed to complete the task)
- Total connection at the time is
(X * 15) + (Y * 1) + (Z * 10)
. This should be beloew - Provided that
max-requests-inflight: 400
,{(X * 15) + (Y * 1) + (Z * 10)}
must be less than400
.
Findings
- In Kubernetes Dashboard, one watch process is opened to the selected resource kind, per user.
- In Environment view, 6~15 concurrent watch process are opened to the multiple resource kinds (e.g.
Deployment, DaemonSet, StatefulSet, ReplicaSet, Job and CronJob
), per user. We might add more resource kinds in the future for Resource health and Dependency graph. - API Priority and Fairness
- According to this article,
- max-requests-inflight (Default — 400) Max number of non-mutating requests in flight at a given time.
- max-mutating-requests-inflight (Default — 200) Max number of mutating requests in flight at a given time.
- If a customer has 1000 users all opening the same page, it's on them to have a powerful enough cluster for all the users, configured to handle enough req/sec to satisfy the demand. Kubernetes API server already has a cache on top of etcd (etcd probably has it's own caching too, and then there is OS fs block cache on the etcd host), we don't need a layer of caching on top of that. #393002 (comment 1290742751)
- We have a (arbitrary) limitations of max 100 concurrent requests handled by an agentk Pod (but user can run any number of Pods), so we have to fit into that. Take into account that connections would be consumed by each user that has the page opened. So if you have 10 users looking at the page, using 6 connections each, that's 60 connections. Then all their CI jobs that run concurrently share the remaining 40, which is not that many. It's not necessarily a problem for short-running requests - worst case they can all fit into a single connection if it's avaiable, sequentially - bad for latency. It's a potential problem for long-running watches/logs/etc though - they take up a whole connection each for the duration of the call. #393002 (comment 1292651994)