Integrate with KEDA (Kubernetes Event-driven Autoscaling) and Vertical Pod Autoscaler for horizontal and vertical autoscaling

KEDA allows to automatically scale horizontally (up and down) resources based on events fired from configured triggers. KEDA provides a great deal of flexibility in defining the triggers and the scaling configuration.

Vertical Pod Autoscaler allows to automatically scale vertically (up and down) cpu and memory resources for Pods based on stats.

From a general perspective, scaling up and down a Postgres database is a decision that shouldn't be taken lightly, as it may have availability and performance implications. Replica horizontal scaling can be performed, if configured properly, in a non-disruptive manner (or at least in a way where the performance hit is understood). Again, if properly configured, it can be automated, and here's where KEDA may play an important role.

Replicas are by default initialized from the primary node, which may suffer some I/O impact during replica creation. For zero impact replicas can also be initialized from backups + continuous archiving. The user may already specify this through the replicate from storage property.

KEDA supports scaling of custom resources (like we'd need for our SGCluster), requiring:

  1. That is the /scale subresource is implemented. This was already planned (#1187 (closed)) and would become now a pre-requisite.
  2. That a KEDA ScaledObject is implemented, in the same namespace as the SGCluster.

For primary (and replicas) vertical scaling it may happen in a disruptive or non-disruptive manner. Disrupting means that the Pod will be evicted and re-created with increased or decreased cpu and/or memory resources based on actual usage. The disadvantage is that in case of the primary being evicted the database will be unavailable until respawned or a failover happens. Non-disruptive means that the node will allow to increase the CPU and memory without stopping the database using a feature called "In Place Resources Update". Unfortunately Vertical Pod Autoscaler (as of today) only supports the disruptive method but there are plans to implement the In Place Resources Update. When this important feature get implemented in the Vertical Pod Autoscaler StackGres vertical autoscaling will be able to use the non-disruptive technique to autoscale Pods.

An initial implementation proposal could be as follows:

kind: SGCluster
spec:
  autoscaling:
    mode: <string> # It will allow to enable or disable any of horizontal and vertical autoscaling.
                      # Possible values are:
                      # * all (default): both horizontal and vertical autoscaling will be enabled
                      # * horizontal: only horizontal autoscaling will be enabled
                      # * vertical: only vertical autoscaling will be enabled
                      # * none: all autoscaling will be disabled
    minInstances: <integer> # The minimum number of total instances that the SGCluster will have at any time the autoscaling is enabled.
    maxInstances: <integer> # The maximum number of total instances that the SGCluster will have at any time the autoscaling is enabled.
    minAllowed:
      patroni:
        cpu: <stinrg> # The minimum allowed CPU for the patroni container
        memory: <stinrg> # The minimum allowed memory for the patroni container
      pgbouncer:
        cpu: <stinrg> # The minimum allowed CPU for the pgbouncer container
        memory: <stinrg> # The minimum allowed memory for the pgbouncer container
      envoy:
        cpu: <stinrg> # The minimum allowed CPU for the envoy container
        memory: <stinrg> # The minimum allowed memory for the envoy container
    maxAllowed:
      patroni:
        cpu: <stinrg> # The maximum allowed CPU for the patroni container
        memory: <stinrg> # The maximum allowed memory for the patroni container
      pgbouncer:
        cpu: <stinrg> # The maximum allowed CPU for the pgbouncer container
        memory: <stinrg> # The maximum allowed memory for the pgbouncer container
      envoy:
        cpu: <stinrg> # The maximum allowed CPU for the envoy container
        memory: <stinrg> # The maximum allowed memory for the envoy container
  replication:
    groups:
    - instances: <integer> # Number of instances for this replication group.
                           # The total number of instance of a cluster is always `.spec.instances`. The sum of the instances in the replication group must be
                           #   less than the total number of instances.
                           # When `.spec.autoscaling` is set indicates the minimum number of instances for this replication group.
                           # The total minimum number of instance of a cluster is always `.spec.autoscaling.minInstances`.
                           # The sum of the minimum instances in all the replication groups must be
                           #   less than the total minimum number of instances.
      maxInstances: <integer> # Maximum number of instances for this replication group. Can only be used if `.spec.autoscaling` is set.
                              # The total maximum number of instance of a cluster is always `.spec.autoscaling.maxInstances`.
                              # The sum of the maximum instances in all the replication groups must be
                              #   less than the total number of instances.

Since Citus does not leverage read replicas for read-only queries, and it's currently the only existing sharding implementation, I'd not propagate up to a SGShardedCluster this configuration.

Edited by Matteo Melli