Geo check rake: NTP timeout can cause database connection loss
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
When access to pool.ntp.org is blocked, such as in air-gapped or strictly secured environments, then the NTP check in Rake task gitlab:geo:check can block long enough to cause the database connection to time out in the background.
Subsequently when the checking progresses to the next task, the database connection object is no longer useful and produces incorrect results:
$ sudo gitlab-rake gitlab:geo:check
Checking Geo ...
GitLab Geo is available ...
GitLab Geo is enabled ... yes
…
Machine clock is synchronized ... Exception: Timeout::Error <<-- Delay here is problematic to further tasks
…
GitLab configured to store new projects in hashed storage? ... no
Try fixing it:
Please enable the setting
`Use hashed storage paths for newly created and renamed projects`
in GitLab's Admin panel to avoid security issues and ensure data integrity.
For more information see:
doc/administration/repository_storage_types.md
All projects are in hashed storage? ... Exception: PG::UnableToSend: no connection to the server
Checking Geo ... Finished
The actual timeout of the connection occurs because PgBouncer is used / configured in that way but that's the recommended setup in our reference architectures.
Steps to reproduce
- Use firewall to block all access to
pool.ntp.org - Run
gitlab-rake gitlab:geo:check
Example Project
Not applicable, this is a self-managed feature (GitLab Geo) issue.
What is the current bug behavior?
- NTP timeout life is too long, disconnects open database connections
- NTP server hostname is not configurable
What is the expected correct behavior?
- Checks following NTP must not be influenced by delays caused in it.
- The NTP check should load configuration of custom NTP hostnames in secured environments.
Relevant logs and/or screenshots
Included in description above. See also customer ticket where this was reported (internal link): https://gitlab.zendesk.com/agent/tickets/281640
Output of checks
This was originally reported on GitLab 14.x but the bug has existed for a long time in the system checks implementation.
Possible fixes
Some suggestions here would be to:
- Load a configured NTP hostname from GitLab configuration when initializing the NTP object here, if available: https://gitlab.com/gitlab-org/gitlab/blob/80e198829e65f1b29f7435e62f52d6c5d885b1bf/ee/lib/system_check/geo/clocks_synchronization_check.rb#L9
- Reorder the list of checks to run database checks early and external environment checks after: https://gitlab.com/gitlab-org/gitlab/blob/04a99806a21e715da323ff25b168b1959782483c/ee/lib/system_check/rake_task/geo_task.rb#L21-30