Skip to content

WIP: Geo: Exit LogCursor if health checks fail for too long

Michael Kozono requested to merge mk/log-cursor-daemon-health-check-geo into master

What does this MR do?

Exits geo-logcursor on serious failures (any health check failure) for too long.

The problem is described in gitlab-org/build/CNG!220 (comment 203815540):

  1. We stopped geo-postgresql which is a serious problem for geo-logcursor and which we expect to cause it to exit.
  2. The new --stdout-logging option allowed a repeating error to show up in the logs, which led us to know what geo-logcursor was doing. 🎉
  3. All StandardErrors are caught here https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo/log_cursor/lease.rb#L41.
  4. And nothing changes in the main loop https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo/log_cursor/daemon.rb#L22-35.
  5. Therefore, infinite loop.

Closes https://gitlab.com/gitlab-org/gitlab-ee/issues/14627

Related issue: https://gitlab.com/charts/gitlab/issues/1211

Does this MR meet the acceptance criteria?

Conformity

Performance and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Michael Kozono

Merge request reports