Monitor and alert corruption-related error messages in Postgres logs

(related discussions: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/11875)

Error messages in Postgres logs have error codes according to this table: https://www.postgresql.org/docs/current/errcodes-appendix.html

It is worth having a special alert on all these:

Class XX — Internal Error
XX000 internal_error
XX001 data_corrupted
XX002 index_corrupted

Reasoning:

  • When corruption happens it should be investigated as early as possible
  • Without such alerts, there are risks to have a serious issue left unnoticed – e.g., during PG 12->14 upgrade for gprd-ci, we had an incident (https://gitlab.com/gitlab-com/gl-infra/production/-/issues/15925) that was noticed only when manually inspecting the logs, there were significant risks to overlook it
    • although, on the other hand, if we put "silence", such an alert would be unnoticed – how to approach this problem?
  • I think even a single error with such code should be analyzed.

cc @kwanyangu @rhenchen.gitlab @alexander-sosna @bshah11