Message encryption in Siphon
A high-level overview of the Siphon tool can be found here: https://gitlab.com/gitlab-org/architecture/gitlab-data-analytics/design-doc/-/blob/master/designs/logical_replication_mvp.md#logical-replication-sync-tool---siphon
We're going to read data (producer) from PostgreSQL databases (red data) and deliver data packages to other systems (managed by GitLab) via a queueing mechanism.
A data package is essentially an array of record changes (full database rows) from a given table. This package (protobuf serialized data) can be 0-50mb big depending on the batching configuration.
In the MVP docs we mentioned that these packages could be encrypted for extra safety and only clients (consumers) who know the decryption keys can decrypt and read these packages.
The actors (producers, consumers and queueing system) are deployed in the same infrastructure.
Benefits of encryption:
- If the queueing system is compromised, our data is not affected.
- A consumer cannot decrypt the packages without the keys.
Drawback:
- Some overhead when encrypting and decrypting data.
- Extra work to implement encryption and key rotation.
Questions:
- Does it actually make sense to encrypt the data?
- Can you recommend encryption standards to use here? (large byte blobs) AES?
- What do you think about the encryption key rotation strategy described here?