Understand scalability of frontend-side Kuberentes Resource fetch

Problem

Currently, Initial iteration of Kubernetes Dashboard in En... (&9859 - closed) is proposing to have frontend clients to directly fetch Kubernetes resources. This is basically relying on user_access keyword that KAS authenticate/authorize the requester and proxy the Kuberentes API requests to agentk. For example, you see the following requests by kubectl get all -v=8 -n flux-system:

I0223 15:22:19.647328   41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/pods?limit=500 
I0223 15:22:20.277198   41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/replicationcontrollers?limit=500 
I0223 15:22:20.798156   41395 round_trippers.go:432] GET https://<ip>/api/v1/namespaces/flux-system/services?limit=500 
I0223 15:22:21.420801   41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/daemonsets?limit=500 
I0223 15:22:21.914102   41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/deployments?limit=500 
I0223 15:22:22.541027   41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/replicasets?limit=500 
I0223 15:22:23.194333   41395 round_trippers.go:432] GET https://<ip>/apis/apps/v1/namespaces/flux-system/statefulsets?limit=500 
I0223 15:22:23.734142   41395 round_trippers.go:432] GET https://<ip>/apis/autoscaling/v2/namespaces/flux-system/horizontalpodautoscalers?limit=500 
I0223 15:22:24.233590   41395 round_trippers.go:432] GET https://<ip>/apis/batch/v1/namespaces/flux-system/cronjobs?limit=500 
I0223 15:22:24.729936   41395 round_trippers.go:432] GET https://<ip>/apis/batch/v1/namespaces/flux-system/jobs?limit=500

In Render Kubernetes resources in Environment inde... (#390769 - closed), we're trying to fetch all of the resource kinds. Available resource kinds vary per Kuberentes setup. In my case, roughly 100 kinds avaialble, including default resource kind + CRD in a cluster:

API resource list

shinya@shinya-B550-VISION-D:~/workspace/thin-gdk$ k api-resources
NAME                              SHORTNAMES     APIVERSION                                   NAMESPACED   KIND
bindings                                         v1                                           true         Binding
componentstatuses                 cs             v1                                           false        ComponentStatus
configmaps                        cm             v1                                           true         ConfigMap
endpoints                         ep             v1                                           true         Endpoints
events                            ev             v1                                           true         Event
limitranges                       limits         v1                                           true         LimitRange
namespaces                        ns             v1                                           false        Namespace
nodes                             no             v1                                           false        Node
persistentvolumeclaims            pvc            v1                                           true         PersistentVolumeClaim
persistentvolumes                 pv             v1                                           false        PersistentVolume
pods                              po             v1                                           true         Pod
podtemplates                                     v1                                           true         PodTemplate
replicationcontrollers            rc             v1                                           true         ReplicationController
resourcequotas                    quota          v1                                           true         ResourceQuota
secrets                                          v1                                           true         Secret
serviceaccounts                   sa             v1                                           true         ServiceAccount
services                          svc            v1                                           true         Service
mutatingwebhookconfigurations                    admissionregistration.k8s.io/v1              false        MutatingWebhookConfiguration
validatingwebhookconfigurations                  admissionregistration.k8s.io/v1              false        ValidatingWebhookConfiguration
customresourcedefinitions         crd,crds       apiextensions.k8s.io/v1                      false        CustomResourceDefinition
apiservices                                      apiregistration.k8s.io/v1                    false        APIService
controllerrevisions                              apps/v1                                      true         ControllerRevision
daemonsets                        ds             apps/v1                                      true         DaemonSet
deployments                       deploy         apps/v1                                      true         Deployment
replicasets                       rs             apps/v1                                      true         ReplicaSet
statefulsets                      sts            apps/v1                                      true         StatefulSet
tokenreviews                                     authentication.k8s.io/v1                     false        TokenReview
localsubjectaccessreviews                        authorization.k8s.io/v1                      true         LocalSubjectAccessReview
selfsubjectaccessreviews                         authorization.k8s.io/v1                      false        SelfSubjectAccessReview
selfsubjectrulesreviews                          authorization.k8s.io/v1                      false        SelfSubjectRulesReview
subjectaccessreviews                             authorization.k8s.io/v1                      false        SubjectAccessReview
allowlistedworkloads                             auto.gke.io/v1                               false        AllowlistedWorkload
horizontalpodautoscalers          hpa            autoscaling/v2                               true         HorizontalPodAutoscaler
multidimpodautoscalers            mpa            autoscaling.gke.io/v1beta1                   true         MultidimPodAutoscaler
verticalpodautoscalers            vpa            autoscaling.k8s.io/v1                        true         VerticalPodAutoscaler
cronjobs                          cj             batch/v1                                     true         CronJob
jobs                                             batch/v1                                     true         Job
certificatesigningrequests        csr            certificates.k8s.io/v1                       false        CertificateSigningRequest
ciliumendpoints                   cep,ciliumep   cilium.io/v2                                 true         CiliumEndpoint
ciliumendpointslices              ces            cilium.io/v2alpha1                           false        CiliumEndpointSlice
ciliumexternalworkloads           cew            cilium.io/v2                                 false        CiliumExternalWorkload
ciliumidentities                  ciliumid       cilium.io/v2                                 false        CiliumIdentity
ciliumlocalredirectpolicies       clrp           cilium.io/v2                                 true         CiliumLocalRedirectPolicy
ciliumnodes                       cn,ciliumn     cilium.io/v2                                 false        CiliumNode
backendconfigs                    bc             cloud.google.com/v1                          true         BackendConfig
containerwatcherstatuses                         containerthreatdetection.googleapis.com/v1   true         ContainerWatcherStatus
leases                                           coordination.k8s.io/v1                       true         Lease
endpointslices                                   discovery.k8s.io/v1                          true         EndpointSlice
events                            ev             events.k8s.io/v1                             true         Event
flowschemas                                      flowcontrol.apiserver.k8s.io/v1beta2         false        FlowSchema
prioritylevelconfigurations                      flowcontrol.apiserver.k8s.io/v1beta2         false        PriorityLevelConfiguration
helmreleases                      hr             helm.toolkit.fluxcd.io/v2beta1               true         HelmRelease
memberships                                      hub.gke.io/v1                                false        Membership
capacityrequests                  capreq         internal.autoscaling.gke.io/v1alpha1         true         CapacityRequest
kustomizations                    ks             kustomize.toolkit.fluxcd.io/v1beta2          true         Kustomization
nodes                                            metrics.k8s.io/v1beta1                       false        NodeMetrics
pods                                             metrics.k8s.io/v1beta1                       true         PodMetrics
dataplanev2encryption             dpv2e          networking.gke.io/v1alpha1                   false        DataplaneV2Encryption
egressnatpolicies                                networking.gke.io/v1                         false        EgressNATPolicy
frontendconfigs                                  networking.gke.io/v1beta1                    true         FrontendConfig
managedcertificates               mcrt           networking.gke.io/v1                         true         ManagedCertificate
networkloggings                   nl             networking.gke.io/v1alpha1                   false        NetworkLogging
redirectservices                  rds            networking.gke.io/v1alpha1                   true         RedirectService
remotenodes                       rn             networking.gke.io/v1alpha1                   false        RemoteNode
serviceattachments                               networking.gke.io/v1                         true         ServiceAttachment
servicenetworkendpointgroups      svcneg         networking.gke.io/v1beta1                    true         ServiceNetworkEndpointGroup
ingressclasses                                   networking.k8s.io/v1                         false        IngressClass
ingresses                         ing            networking.k8s.io/v1                         true         Ingress
networkpolicies                   netpol         networking.k8s.io/v1                         true         NetworkPolicy
runtimeclasses                                   node.k8s.io/v1                               false        RuntimeClass
updateinfos                       updinf         nodemanagement.gke.io/v1alpha1               true         UpdateInfo
alerts                                           notification.toolkit.fluxcd.io/v1beta2       true         Alert
providers                                        notification.toolkit.fluxcd.io/v1beta2       true         Provider
receivers                                        notification.toolkit.fluxcd.io/v1beta2       true         Receiver
poddisruptionbudgets              pdb            policy/v1                                    true         PodDisruptionBudget
podsecuritypolicies               psp            policy/v1beta1                               false        PodSecurityPolicy
clusterrolebindings                              rbac.authorization.k8s.io/v1                 false        ClusterRoleBinding
clusterroles                                     rbac.authorization.k8s.io/v1                 false        ClusterRole
rolebindings                                     rbac.authorization.k8s.io/v1                 true         RoleBinding
roles                                            rbac.authorization.k8s.io/v1                 true         Role
priorityclasses                   pc             scheduling.k8s.io/v1                         false        PriorityClass
volumesnapshotclasses                            snapshot.storage.k8s.io/v1                   false        VolumeSnapshotClass
volumesnapshotcontents                           snapshot.storage.k8s.io/v1                   false        VolumeSnapshotContent
volumesnapshots                                  snapshot.storage.k8s.io/v1                   true         VolumeSnapshot
buckets                                          source.toolkit.fluxcd.io/v1beta2             true         Bucket
gitrepositories                   gitrepo        source.toolkit.fluxcd.io/v1beta2             true         GitRepository
helmcharts                        hc             source.toolkit.fluxcd.io/v1beta2             true         HelmChart
helmrepositories                  helmrepo       source.toolkit.fluxcd.io/v1beta2             true         HelmRepository
ocirepositories                   ocirepo        source.toolkit.fluxcd.io/v1beta2             true         OCIRepository
csidrivers                                       storage.k8s.io/v1                            false        CSIDriver
csinodes                                         storage.k8s.io/v1                            false        CSINode
csistoragecapacities                             storage.k8s.io/v1                            true         CSIStorageCapacity
storageclasses                    sc             storage.k8s.io/v1                            false        StorageClass
volumeattachments                                storage.k8s.io/v1                            false        VolumeAttachment

shinya@shinya-B550-VISION-D:~/workspace/thin-gdk$ k api-resources | wc -l
95

Provided that we implement the current plan as-is, we can roughly estimate the scalability of this feature:

100 kinds of API resources in the cluster. => 100 requests are executed per frontend client and environment.
Resource refresh interval (Frontend polling interval) is 10 sec. => 600 requests per min (= 100 * 6).
10 different users visit the same environment index page and open frontend-production and backend-production environments. => 12000 requests per min (= 600 * 10 * 2)

We should make sure if this is acceptable for user's Kuberentes API server. One of the concerns is if it could hit API rate limit 429 Too Many Requests, it'll disturb the other important operations such as GitOps/CI-Access.

Open discussions

Past discussions

Summary

Frontend use Watch API instead of polling. For more details, See https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes. and example using the KAS User Access feature here #393002 (comment 1291133940)
Bump the 100 limit or remove it completely on agetnk.
Use Feature Flags in case it causes a trouble in user's cluster.
Document that users need to scale up their clusters as users are more actively using resource watches. Roughly estimating,
- X users are watching a frontend production environment = X * 15 watches
- Y users are watching a frontend resource in kubernetes dashboard = Y * 1 watch
- Z operations are running (CI-Job/GitOps) = Z * 10 API requests (let's say 10 requests needed to complete the task)
- Total connection at the time is (X * 15) + (Y * 1) + (Z * 10). This should be beloew
- Provided that max-requests-inflight: 400, {(X * 15) + (Y * 1) + (Z * 10)} must be less than 400.

Findings

In Kubernetes Dashboard, one watch process is opened to the selected resource kind, per user.
In Environment view, 6~15 concurrent watch process are opened to the multiple resource kinds (e.g. Deployment, DaemonSet, StatefulSet, ReplicaSet, Job and CronJob), per user. We might add more resource kinds in the future for Resource health and Dependency graph.
API Priority and Fairness
According to this article,
- max-requests-inflight (Default — 400) Max number of non-mutating requests in flight at a given time.
- max-mutating-requests-inflight (Default — 200) Max number of mutating requests in flight at a given time.
If a customer has 1000 users all opening the same page, it's on them to have a powerful enough cluster for all the users, configured to handle enough req/sec to satisfy the demand. Kubernetes API server already has a cache on top of etcd (etcd probably has it's own caching too, and then there is OS fs block cache on the etcd host), we don't need a layer of caching on top of that. #393002 (comment 1290742751)
We have a (arbitrary) limitations of max 100 concurrent requests handled by an agentk Pod (but user can run any number of Pods), so we have to fit into that. Take into account that connections would be consumed by each user that has the page opened. So if you have 10 users looking at the page, using 6 connections each, that's 60 connections. Then all their CI jobs that run concurrently share the remaining 40, which is not that many. It's not necessarily a problem for short-running requests - worst case they can all fit into a single connection if it's avaiable, sequentially - bad for latency. It's a potential problem for long-running watches/logs/etc though - they take up a whole connection each for the duration of the call. #393002 (comment 1292651994)

Out of scope

Edited Mar 01, 2023 by Shinya Maeda