TPS of a cluster drops after an annotations it's configured

Summary

Tps drops between ~15% and 50% when an annotations is configured

Current Behaviour

Given with a SGCluster the following configuration:

apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
 name: test
 namespace: availability
spec:
 instances: 3
 postgresVersion: 'latest'
 pods:
   persistentVolume:
     size: '1Gi'
 initialData:
   scripts:
   - name: create-pgbench-user
     scriptFrom:
       secretKeyRef:
         name: pgbench-credentials
         key: pgbench-create-user-sql
   - name: create-pgbench-database
     script: |
       CREATE DATABASE pgbench OWNER pguser;

Changing the cluster metadata configuration has a negative impact in the benchmarks results performed with pgbench.

The configuration change just adds an annotation on all generated resources:

apiVersion: stackgres.io/v1
kind: SGCluster
...
spec:
 metadata:
   annotations:
     allResources:
       test: test
...

The used command its the following:

pgbench -U pguser pgbench -T 20 -C

Which basically translate to: execute as many transactions as possible during 20 seconds with a single client and single thread.

The results are consistent across several runs:

First Run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	1782	924
latency average	11.143 ms	15.473 ms
tps (including connections establishing)	89.745445	64.629396 2
tps (excluding connections establishing)	126.200344	98.781321

Second run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2379	806
latency average	8.422 ms	8.562 ms
tps (including connections establishing)	118.734658	116.791060
tps (excluding connections establishing)	193.638509	201.606477

Third run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2033	601
latency average	9.841 ms	12.408 ms
tps (including connections establishing)	101.618137	80.594898
tps (excluding connections establishing)	167.491078	123.698517

Fourth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2068	652
latency average	9.675 ms	14.530 ms
tps (including connections establishing)	103.359918	68.825433
tps (excluding connections establishing)	167.388492	120.272571

Fifth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2455	1469
latency average	8.147 ms	11.974 ms
tps (including connections establishing)	122.746237	83.511619
tps (excluding connections establishing)	195.815268	149.100572

Sixth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2075	725
latency average	9.639 ms	12.922 ms
tps (including connections establishing)	103.745334	77.386032
tps (excluding connections establishing)	168.598206	133.474639

Seventh run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	1826	989
latency average	10.958 ms	14.269 ms
tps (including connections establishing)	91.258764	70.084443
tps (excluding connections establishing)	151.176461	118.084640

Eighth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	1852	1484
latency average	10.804 ms	13.478 ms
tps (including connections establishing)	92.562449	74.196573
tps (excluding connections establishing)	155.523867	148.267451

Nineth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	1786	903
latency average	11.198 ms	14.507 ms
tps (including connections establishing)	89.299405	68.934205
tps (excluding connections establishing)	149.084738	119.468282

Tenth run

Metrics	SGCluster 1	SGCluster 2 (with the annotation)
number of transactions actually processed	2330	1718
latency average	8.585 ms	11.645 ms
tps (including connections establishing)	116.479194	85.870517
tps (excluding connections establishing)	187.519315	165.910493

What this data show us, is that there is ~50% fewer transactions being processed

Steps to reproduce

I created an script to reproduce the benchmark

#!/bin/sh
 
is_cluster_ready(){
 INSTANCES_READY="$(kubectl get pods -n availability | tail -n +2 | awk '{ print $2 }' | grep 6/6 -c)"
 if [ "$INSTANCES_READY" = 3 ]
 then
   return 0
 else
   return 1
 fi
}
 
is_cluster_deleted(){
 PODS=$(kubectl get pods -n availability | wc -l)
 if [ "$PODS" = 0 ]
 then
   return 0
 else
   return 1
 fi
}
 
is_pgbench_ready(){
 kubectl get pods -n pgbench pgbench | tail -n +2 | awk '{ print $3 }' | grep Running > /dev/null
}
 
init_pgbench(){
 kubectl exec -it pgbench -c pgbench -n pgbench -- pgbench -i -U pguser pgbench
}
 
run_pgbench(){
 OUTPUT=$(kubectl exec -it pgbench -c pgbench -n pgbench -- pgbench -U pguser pgbench -T 20 -C)
 echo "$OUTPUT"
}
 
kubectl create namespace availability
 
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Secret
metadata:
 name: pgbench-credentials
 namespace: availability
stringData:
 pgbench-create-user-sql: "CREATE USER pguser WITH PASSWORD 'pguser';"
EOF
 
cat << EOF | kubectl apply -f -
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
 name: test
 namespace: availability
spec:
 instances: 3
 postgresVersion: 'latest'
 pods:
   persistentVolume:
     size: '1Gi'
 initialData:
   scripts:
   - name: create-pgbench-user
     scriptFrom:
       secretKeyRef:
         name: pgbench-credentials
         key: pgbench-create-user-sql
   - name: create-pgbench-database
     script: |
       CREATE DATABASE pgbench OWNER pguser;
EOF
 
kubectl create namespace pgbench
 
cat << EOF > pgpass
test-primary.availability.svc:5432:pgbench:pguser:pguser
test-replicas.availability.svc:5432:pgbench:pguser:pguser
EOF
 
cat << EOF > pg_service.conf
[pgbench]
host=test-primary.availability.svc
port=5432
dbname=pgbench
[pgbenchreplica]
host=test-replicas.availability.svc
port=5432
dbname=pgbench
EOF
 
kubectl create secret generic pgbench --from-file=.pgpass=pgpass --from-file=.pg_service.conf=pg_service.conf -n pgbench
 
while ! is_cluster_ready
do
 sleep 2
done
 
kubectl apply -f pgbench.yaml
 
while ! is_pgbench_ready
do
 sleep 2
done
 
init_pgbench
 
run_pgbench > clean_write.log
 
kubectl delete sgcluster -n availability "test"
 
while ! is_cluster_deleted
do
 sleep 2
done
 
cat << EOF | kubectl apply -f -
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
 name: test
 namespace: availability
spec:
 instances: 3
 postgresVersion: 'latest'
 pods:
   persistentVolume:
     size: '1Gi'
 initialData:
   scripts:
   - name: create-pgbench-user
     scriptFrom:
       secretKeyRef:
         name: pgbench-credentials
         key: pgbench-create-user-sql
   - name: create-pgbench-database
     script: |
       CREATE DATABASE pgbench OWNER pguser;
EOF
 
while ! is_cluster_ready
do
 sleep 2
done
 
init_pgbench
 
run_pgbench > alter_write.log &
PG_BENCH_PID=$!
 
cat << EOF | kubectl apply -f -
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
 name: test
 namespace: availability
spec:
 metadata:
   annotations:
     allResources:
       test: test
 instances: 3
 postgresVersion: 'latest'
 pods:
   persistentVolume:
     size: '1Gi'
 initialData:
   scripts:
   - name: create-pgbench-user
     scriptFrom:
       secretKeyRef:
         name: pgbench-credentials
         key: pgbench-create-user-sql
   - name: create-pgbench-database
     script: |
       CREATE DATABASE pgbench OWNER pguser;
EOF
 
wait $PG_BENCH_PID
 
echo ""
echo "PgBench results cluster unaltered"
echo "---------------------------------"
cat clean_write.log
 
echo ""
echo "PgBench results with annotation alterations"
echo "-------------------------------------------"
cat alter_write.log

Expected Behaviour

There should not be any performance impact on the cluster by just changing the annotation configuration.

Environment

StackGres version: 1.0.0-beta2
Kubernetes version: 1.16
Cloud provider or hardware configuration: kind 0.9.0 with the following configuration

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

Edited Jul 22, 2021 by Xavier Sierra