Auto DevOps: improve reliability of postgres database
Problem to solve
The postgres database deployed by Auto DevOps may not be ready for production usage.
We are not considering using this for our own internal dogfooding because of some limitations (see https://gitlab.com/gitlab-com/version-gitlab-com/issues/111#note_145183827 )
I want to point out that AutoDevOps currently uses an older, single Pod deployment of PostgreSQL. If we're worried about load at all, we should not be using this. This does not have access to WAL, active-passive, et al that I believe we should have on all production systems.
...
Using an GCP CloudSQL database will provide reliable state, maintenance, and resilience without the extended engineering effort of incorporating that in AutoDevOps
Target audience
Further details
Proposal
Along with the limitations listed above I think some specific things it would need to be considered production grade would be:
-
Regular backups enabled by default; with clear documentation about how to recover from these backups -
Alerting by default; eg. when the disk is almost full -
Some kind of replication (maybe optional)
One alternative approach would be for Auto DevOps to actually provision managed databases for you rather than running them as pods in the cluster (maybe just in production and maybe it's an optional setting). Managed databases come with all of these features already out of the box. This could be done using https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/ or implemented in GitLab somehow.
Permissions and Security
Documentation
What does success look like, and how can we measure that?
GitLab's infra team would happily use the default database provided by Auto DevOps rather than having to manually provision one.