Implement a correct solution for resources of cluster's pods

Currently the cluster is created setting uniquely resources request and limit on the patroni container but a correct solution would be to split the resources limit and request among all containers of the cluster's pods. We should research the correct solution to this problem making sure that:

  • Pod total resources limit and request are shared among all containers and the sum of resources limit and request is equals to the one specified in the profile.
  • Pod resources limit and request should be balanced in a way that respect the role of each container. We should make sure each container has the right amount of memory and cpu for it to operate correctly and at best performance possible with that profile configuration.
  • Resources limit and request are correctly specified in the configuration of a profile. Currently we set a value for cpu and memory and use that in both resource limit and request, we should check this is a correct approach.

Proposal

This is a proposal to enhance SGInstanceProfile in order to set the resource requirements for containers and init containers and a similar section to include with #820 (closed) or implement with this issue if #820 (closed) get implemented before this one:

apiVersion: stackgres.io/v1
kind: SGInstanceProfile
spec:
  cpu: 16000
  memory: 64Gi
  containers:
    pgbouncer:
      cpu: 1000
      memory: 64Mi
    envoy:
      cpu: 1000
      memory: 64Mi
    prometheus-postgres-exporter:
      cpu: 1000
      memory: 8Mi
    postgres-util:
      cpu: 1000
      memory: 8Mi
    fluent-bit:
      cpu: 1000
      memory: 8Mi
    fluentd:
      cpu: 4000
      memory: 2Gi
    cluster-controller: # Could be applied also to distributedlogs-controller
      cpu: 1000
      memory: 512Mi # Fix #1566 to improve this
  initContainers:
    setup-arbitrary-user:
      cpu: 1000
      memory: 8Mi
    setup-data-paths:
      cpu: 1000
      memory: 8Mi
    relocate-binaries:
      cpu: 1000
      memory: 8Mi
    setup-scripts:
      cpu: 1000
      memory: 8Mi
    pgbouncer-auth-file:
      cpu: 1000
      memory: 8Mi
    cluster-reconciliation-cycle: # Could be applied also to distributedlogs-reconciliation-cycle
      cpu: 1000
      memory: 512Mi # Fix #1566 to improve this
    major-version-upgrade:
      cpu: 16000
      memory: 64Gi
    reset-patroni:
      cpu: 1000
      memory: 8Mi

If the user do not specify a container sub-section in the .spec.containers or .spec.initContainers section here are the proposed formulas to calculate the values of such sub-sections on creation of sGInstanceProfile:

Those values may be lowered. Also in those cases the requests may be lower than the limit when it comes to apply this resource requirements to specified containers in order to use the shared CPUs when CPU manager policy is set to static in order to be in the Burstable group and use the shared pool of CPUs.

  • For pgbouncer:

    • cpu as millicpu: min(1000, floor(toMillicpu(".spec.cpu") / 16))
    • memory as Mi: 64

With 64Mi PgBouncer should be able to handle 4096 connections. See https://www.pgbouncer.org/features.html

Low memory requirements (2 kB per connection by default). This is because PgBouncer does not need to see full packets at once.

  • For envoy:

    • cpu as millicpu: min(4000, floor(toMillicpu(".spec.cpu") / 4))
    • memory as Mi: 64

Scaling connections with pgbench from 1 connection to 100 only add 1Mi of used memory to the 20Mi initially used and with another increment of 11Mi with 1000 connections. If we consider 12Mi each 1000 connections with 64Mi envoy should be able to handle up to 3500 connections.

  • For prometheus-postgres-exporter:

    • cpu as millicpu: min(1000, floor(toMillicpu(".spec.cpu") / 16))
    • memory as Mi: 8Mi
  • For postgres-util:

    • cpu as millicpu: min(1000, floor(toMillicpu(".spec.cpu") / 16))
    • memory as Mi: 8Mi
  • For fluent-bit:

    • cpu as millicpu: min(1000, floor(toMillicpu(".spec.cpu") / 16))
    • memory as Mi: 8Mi
  • For fluentd:

    • cpu as millicpu: min(4000, floor(toMillicpu(".spec.cpu") / 4))
    • memory as Mi: 2Gi

Memory usage is quite high. Starting from 512Mi it easily increases to 612Mi with a cluster of 3 instances with low log usage. So 2Gi seems a quite safe value.

For cluster-controller / distributedlogs-controller / cluster-reconciliation-cycle / distributedlogs-reconciliation-cycle:

  • cpu as millicpu: min(1000, floor(toMillicpu(".spec.cpu") / 4))
  • memory as Mi: 512Mi

See #1566 (closed).

For major-version-upgrade:

  • cpu as millicpu: toMillicpu(".spec.cpu")
  • memory as Mi: toMi(".spec.memory")

Major version upgrade will run pg_upgrade command that may or may not require all of that CPU and memory. In any case better to give all the available resources to this container since it runs alone and we may find out a faster alternative that requires more memory and CPU in the future.

  • For prometheus-postgres-exporter / postgres-util / fluent-bit / setup-arbitrary-user / setup-data-paths / relocate-binaries / setup-scripts / pgbouncer-auth-file / reset-patroni:

    • cpu as millicpu: toMillicpu(".spec.cpu")
    • memory as Mi: toMi(".spec.memory")

Init containers may or may not require all of that CPU and memory. In any case better to give all the available resources to any init container since they runs alone.

Reference: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#how-pods-with-resource-requests-are-scheduled

** Acceptance criteria: **

  • Implement the feature
  • Implement the changhe in the REST API.
  • create test
  • Documentation
Edited by Matteo Melli