Skip to content

Geo check rake: NTP timeout can cause database connection loss

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Summary

When access to pool.ntp.org is blocked, such as in air-gapped or strictly secured environments, then the NTP check in Rake task gitlab:geo:check can block long enough to cause the database connection to time out in the background.

Subsequently when the checking progresses to the next task, the database connection object is no longer useful and produces incorrect results:

$ sudo gitlab-rake gitlab:geo:check 
Checking Geo ...

GitLab Geo is available ... 
GitLab Geo is enabled ... yes 

Machine clock is synchronized ... Exception: Timeout::Error  <<-- Delay here is problematic to further tasks

GitLab configured to store new projects in hashed storage? ... no 
Try fixing it: 
Please enable the setting 
`Use hashed storage paths for newly created and renamed projects` 
in GitLab's Admin panel to avoid security issues and ensure data integrity. 
For more information see: 
doc/administration/repository_storage_types.md 
All projects are in hashed storage? ... Exception: PG::UnableToSend: no connection to the server

Checking Geo ... Finished

The actual timeout of the connection occurs because PgBouncer is used / configured in that way but that's the recommended setup in our reference architectures.

Steps to reproduce

  1. Use firewall to block all access to pool.ntp.org
  2. Run gitlab-rake gitlab:geo:check

Example Project

Not applicable, this is a self-managed feature (GitLab Geo) issue.

What is the current bug behavior?

  1. NTP timeout life is too long, disconnects open database connections
  2. NTP server hostname is not configurable

What is the expected correct behavior?

  1. Checks following NTP must not be influenced by delays caused in it.
  2. The NTP check should load configuration of custom NTP hostnames in secured environments.

Relevant logs and/or screenshots

Included in description above. See also customer ticket where this was reported (internal link): https://gitlab.zendesk.com/agent/tickets/281640

Output of checks

This was originally reported on GitLab 14.x but the bug has existed for a long time in the system checks implementation.

Possible fixes

Some suggestions here would be to:

  1. Load a configured NTP hostname from GitLab configuration when initializing the NTP object here, if available: https://gitlab.com/gitlab-org/gitlab/blob/80e198829e65f1b29f7435e62f52d6c5d885b1bf/ee/lib/system_check/geo/clocks_synchronization_check.rb#L9
  2. Reorder the list of checks to run database checks early and external environment checks after: https://gitlab.com/gitlab-org/gitlab/blob/04a99806a21e715da323ff25b168b1959782483c/ee/lib/system_check/rake_task/geo_task.rb#L21-30
Edited by 🤖 GitLab Bot 🤖