Consider using Server Side Apply

Server Side Apply Considerations

Server side apply has been enabled by default in kubernetes 1.16 and later, and provides some advantages that we could take advantage of.

Here I try to explain its implications for StackGres when it's enabled, and how we can benefit from it.

What is it

It's a mechanism of kubernetes to keep track of who modifies what and intent to prevent conflict by assigning owners to every applied field.

For a more in depth explanation of what server side apply is, please check the kubernetes documentation

What we do at the moment

At the moment of this writing, we don't support SSA. If one of the operator generated resource has some SSA configuration the operator we just get rid of it.

What we could gain of using SSA

Currently, every time that the operator detects a resource out of sync, in most cases StackGres will replace the existing resource with the in sync version.

This strategy comes with some problems. Specifically there could be changes that comes from k8s, that the operator could detect as out of sync. The operator handle this scenarios but still can have some problems.

Skipping resource comparison

Consider the case of the primary service endpoint.

Given a SGCluster named simple, the primary service endpoint would like this:

apiVersion: v1
kind: Endpoints
metadata:
 annotations:
   acquireTime: "2021-07-14T17:10:36.630841+00:00"
   leader: simple-0
   optime: "25728552"
   renewTime: "2021-07-14T17:21:37.138409+00:00"
   transitions: "0"
   ttl: "30"
 creationTimestamp: "2021-07-14T17:10:33Z"
 labels:
   app: StackGresCluster
   cluster: "true"
   cluster-name: simple
   cluster-uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
 name: simple
 namespace: default
 ownerReferences:
 - apiVersion: stackgres.io/v1
   controller: true
   kind: SGCluster
   name: simple
   uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
 resourceVersion: "5988"
 selfLink: /api/v1/namespaces/default/endpoints/simple
 uid: 01ac8f78-4c6d-47cb-8425-2d9e6ecf1f20
subsets:
- addresses:
 - hostname: simple-0
   ip: 10.244.0.17
   nodeName: kind-control-plane
   targetRef:
     kind: Pod
     name: simple-0
     namespace: default
     resourceVersion: "3733"
     uid: 2f006c5a-fab6-4b62-ab70-c0518ca483df
 ports:
 - name: pgport
   port: 7432
   protocol: TCP
 - name: pgreplication
   port: 7433
   protocol: TCP

In this case, the StackGres operator only manage directly the fields:

metadata.ownerReferences
metadata.labels

The rest of the fields are managed by Patroni or Kubernetes.

In order to maintain the endpoint in sync the operator needs to do the following steps:

Generate the required endpoint
Compare it with the one that is already created
If there is a difference with the fields that are managed by StackGres, patch the resource

If SSA is implemented, the operator could skip the second step and just send patches all the time.

The patch would be equivalent to this command:

cat << EOF | kubectl apply --server-side=true --field-manager="StackGres" -f -
apiVersion: v1
kind: Endpoints
metadata:
 name: simple
 namespace: default
 labels:
   app: StackGresCluster
   cluster: "true"
   cluster-name: simple
   cluster-uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
 ownerReferences:
 - apiVersion: stackgres.io/v1
   controller: true
   kind: SGCluster
   name: simple
   uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
EOF

Since its using SSA the flag --server-side=true, it would interfere with the rest of the fields. And only if there is an actual difference it would be applied.

Avoid accidental field deletion

Continuing with the previous case, consider a case in which the StackGres operator needs to add an annotation. Currently, the operator would handle the patch as following command:

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Endpoints
metadata:
 annotations:
   newAnnotation: test
 name: simple
 namespace: default
 labels:
   app: StackGresCluster
   cluster: "true"
   cluster-name: simple
   cluster-uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
 ownerReferences:
 - apiVersion: stackgres.io/v1
   controller: true
   kind: SGCluster
   name: simple
   uid: 9e255311-31ac-4cc9-9e27-e96c4e618a30
EOF

The above command has a side effect that it would delete the rest of the fields that are not managed directly by the operator like subnets, or the Patroni annotations. In this case, Patroni and Kubernetes would re-create its data but there is a brief moment in which the resource doesn't have it.

By using SSA, the other fields would not be altered. Hence, the problem its avoided.

Inform users about who the owner is of the generated resources fields

Using the --show-managed-flag, a user can see the information of who is owns which field of a given resource.

If the flag is used with the primary service endpoint it gives the following response:

apiVersion: v1
kind: Endpoints
metadata:
 annotations:
   acquireTime: "2021-07-14T19:30:04.977370+00:00"
   leader: simple-0
   optime: "25728496"
   renewTime: "2021-07-14T20:29:55.601968+00:00"
   transitions: "0"
   ttl: "30"
 creationTimestamp: "2021-07-14T19:29:48Z"
 labels:
   app: StackGresCluster
   cluster: "true"
   cluster-name: simple
   cluster-uid: c4554cdd-c351-4313-80df-ead7bd679dd6
 managedFields:
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:metadata:
       f:labels:
         .: {}
         f:app: {}
         f:cluster: {}
         f:cluster-name: {}
         f:cluster-uid: {}
       f:ownerReferences:
         .: {}
         k:{"uid":"c4554cdd-c351-4313-80df-ead7bd679dd6"}:
           .: {}
           f:apiVersion: {}
           f:controller: {}
           f:kind: {}
           f:name: {}
           f:uid: {}
   manager: okhttp
   operation: Update
   time: "2021-07-14T19:29:48Z"
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:metadata:
       f:annotations:
         .: {}
         f:acquireTime: {}
         f:leader: {}
         f:optime: {}
         f:renewTime: {}
         f:transitions: {}
         f:ttl: {}
     f:subsets: {}
   manager: Patroni
   operation: Update
   time: "2021-07-14T20:29:55Z"
 name: simple
 namespace: default
 ownerReferences:
 - apiVersion: stackgres.io/v1
   controller: true
   kind: SGCluster
   name: simple
   uid: c4554cdd-c351-4313-80df-ead7bd679dd6
 resourceVersion: "37948"
 selfLink: /api/v1/namespaces/default/endpoints/simple
 uid: 4221accf-c2e2-450b-a12f-b03a8a5be9b2
subsets:
- addresses:
  - hostname: simple-0
   ip: 10.244.0.20
   nodeName: kind-control-plane
   targetRef:
     kind: Pod
     name: simple-0
     namespace: default
     resourceVersion: "27817"
     uid: b6381dcd-d16d-45c0-b4d3-e92a42fa0987
 ports:
 - name: pgport
   port: 7432
   protocol: TCP
 - name: pgreplication
   port: 7433
   protocol: TCP

This outcome can be confusing, because on one side it says that Patroni is the owner of the annotations: acquireTime, leader, optime, renewTime, transitions and ttl; and also the field subsets. On the other side okhttp is the owner of the OwnerReferences field, and the labels: app, cluster, cluster-name and cluster-uid.

The actual owner of the okhttp fields is the StackGres operator, but since it's not specified, kubernetes calculates the field based on the User-Agent of the HTTP Request, okhttp its just the underlying client that fabric8 uses to interact with Kubernetes API.

It would be clearer for the user if owner of StackGres fiels its actually StackGres. Also, if other applications use the okhttp as a client to interact with Kubernetes API, Kubernetes could identify those applications as the same manager of the StackGres fields.

By using the field manager flag, it is possible to specify to kubernetes who is actual owner of those, which, in this case is StackGres

Tolerate user changes

By using SSA the StackGres operator would focus only in the fields it manages, this allow to users to make changes on the generated resources without the operator to be aware of.

This can be useful in some scenarios like the following:

Suppose that in a kubernetes cluster there is a mutation webhook in place that add an annotation with a key department value security to every Secret in the cluster, regardless of who have created those Secrets.

Such webhook would make our database credential secret to look like this (Notice the department annotation):

apiVersion: v1
data:
 authenticator-password: MDM5ZC1hODQ1LTQzMzYtYjU2
 replication-password: YmJkNi02NmNkLTRiNTMtOTk4
 restapi-password: YjY1Yy0wMjliLTQ1ZDQtOWVl
 superuser-password: YmY4OC0xMDgwLTQ5NDAtYWFh
kind: Secret
metadata:
 annotation:
   department: security
 creationTimestamp: "2021-07-14T19:29:49Z"
 labels:
   app: StackGresCluster
   cluster-name: simple
   cluster-uid: c4554cdd-c351-4313-80df-ead7bd679dd6
 name: simple
 namespace: default
 ownerReferences:
 - apiVersion: stackgres.io/v1
   controller: true
   kind: SGCluster
   name: simple
   uid: c4554cdd-c351-4313-80df-ead7bd679dd6
 resourceVersion: "27682"
 selfLink: /api/v1/namespaces/default/secrets/simple
 uid: 7726f761-e746-4107-a0d3-801f66cfddee
type: Opaque

In a case like this, the operator would detect the Secret as out of sync, and proceed to delete the annotation. Then the webhook would add the annotation again making the resource to appear to be out of sync, resulting in an endless loop.

By using SSA, the operator would not care about such differences as long as the fields that are managed by StackGres remains unaltered.

This behavior would also allow users to play with the generated StackGres resources on their own, which can be useful to solve unexpected issues in some environments.

What happens if the Server Side Apply feature gate its disabled?

Since the SSA support its in beta since 1.16, its enabled by default. Nonetheless, users can still disable this feature gate.

If the SSA feature gate is disabled, it would cause that every request that have the --server-side flag to fail.

At this moment, this would suppose a problem, but if we start to rely on SSA it could.

Nonetheless, we have alternatives to achieve a similar behavior that SSA would provide for us.

Currently, every time that the operator is going to make a patch of a resource it makes an operation similar to this:

client.apps()
   .configmaps()
   .inNamespace(namespace)
   .withName(name)
   .patch(configmap);

This results in an actual replacement of the resource, this is because fabric8 internally, would get the resource to patch. Can be seen here.

Then it would calculate its difference with the given resource. As can seen here

Also, it assumes that the patch type is json. Looking into the fabric8 code a warning can be found of this behavior here:

In order to avoid this, we can change the way of StackGres makes patches to something like this:

client.apps()
   .configmaps()
   .inNamespace(namespace)
   .withName(name)
   .patch(new PatchContext.Builder()
       .withPatchType(PatchType.JSON_MERGE)
       .build(), resource);

By specifying the JSON_MERGE as the patch type, we can avoid fabric8 to get the resource to patch and calculate its differences, instead it would send the given object as the patch and let kubernetes to perform the merge.

More information about this can be found [here] (https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/#use-a-strategic-merge-patch-to-update-a-deployment)

Using this strategy, could only prevent the accidental field deletion and resource skipping resource comparison.

Is Server Side Apply bullet proof?

No, users can intentionally or unintentionally override the SSA configuration, hence the operator must be prepared for these scenarios.

Suppose that StackGres creates an resource in the following manner:

 cat << EOF | kubectl apply --server-side=true --field-manager="StackGres" -f -
apiVersion: v1
kind: ConfigMap
metadata:
 name: test-ssa
data:
 test: "1"
EOF

As is can be seem, SSA it's enabled for the config map and we have specied 'StackGres' as the field manager.

Now if we get the resource using the command:

kubectl get cm test-ssa -o yaml

The result would be that similar to these:

apiVersion: v1
data:
 test: "1"
kind: ConfigMap
metadata:
 creationTimestamp: "2021-07-13T14:58:06Z"
 name: test-ssa
 namespace: default
 resourceVersion: "507"
 selfLink: /api/v1/namespaces/default/configmaps/test-ssa
 uid: b9854fef-b520-41b4-9b53-c7b5cc49e15e

There is nothing on the previous response that hints anything about the owner of the fields. If we wanted to know who owns the fields we need to use the flag --show-managed-fields, in which case the response is the like following:

apiVersion: v1
data:
 test: "1"
kind: ConfigMap
metadata:
 creationTimestamp: "2021-07-13T14:58:06Z"
 managedFields:
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:data:
       f:test: {}
   manager: StackGres
   operation: Apply
   time: "2021-07-13T14:58:06Z"
 name: test-ssa
 namespace: default
 resourceVersion: "507"
 selfLink: /api/v1/namespaces/default/configmaps/test-ssa
 uid: b9854fef-b520-41b4-9b53-c7b5cc49e15e

If a user tries to update the resource with following command:

cat << EOF | kubectl apply -f -                                              
apiVersion: v1 
kind: ConfigMap
metadata:
 name: test-ssa
data:
 test: "2"
EOF

It might be assumed that the user will get an error, but instead the change its allowed, If we look at the managed fields configuration, we find out the the ownership of the field test have been transferred to kubectl-client-side-apply.

apiVersion: v1
data:
 test: "2"
kind: ConfigMap
metadata:
 annotations:
   kubectl.kubernetes.io/last-applied-configuration: |
     {"apiVersion":"v1","data":{"test":"2"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"test-ssa","namespace":"default"}}
 creationTimestamp: "2021-07-13T14:58:06Z"
 managedFields:
 - apiVersion: v1
   fieldsType: FieldsV1
   fieldsV1:
     f:data:
       f:test: {}
     f:metadata:
       f:annotations:
         .: {}
         f:kubectl.kubernetes.io/last-applied-configuration: {}
   manager: kubectl-client-side-apply
   operation: Update
   time: "2021-07-13T15:09:27Z"
 name: test-ssa
 namespace: default
 resourceVersion: "1685"
 selfLink: /api/v1/namespaces/default/configmaps/test-ssa
 uid: b9854fef-b520-41b4-9b53-c7b5cc49e15e

If the operator tries to change back the resource, the same way that it did before it would get a conflict error:

cat << EOF | kubectl apply --server-side=true --field-manager="StackGres" -f -        
apiVersion: v1   
kind: ConfigMap
metadata:
 name: test-ssa
data:
 test: "1"
EOF
error: Apply failed with 1 conflict: conflict with "kubectl-client-side-apply" using v1: .data.test
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
 command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
 manifest to remove references to the fields that should keep their
 current managers.
* You may co-own fields by updating your manifest to match the existing
 value; in this case, you'll become the manager if the other manager(s)
 stop managing the field (remove it from their configuration).
See http://k8s.io/docs/reference/using-api/server-side-apply/#conflicts

As the error message suggest, to avoid this error the operator most use the --force-conflicts flag, and likely on every request.

Edited Jul 15, 2021 by Xavier Sierra