Geo rake check: we cannot assume pool.ntp.org is accessible
Summary
Related to #358924 and #5795 based on the analysis in a customer ticket (internal link) the NTP check will be run against the default NTP server of pool.ntp.org as we can see in the net-ntp gem source.
Customers in air gapped environments or many on-premises datacentres may be able to resolve pool.ntp.org
but won't be able to make arbitrary outgoing connections to the internet.
This has come again in another customer ticket (internal link) and we need a workaround so customers can redirect the query to a valid local NTP time source. As it appears to be hard coded up stream, this is going to be a lot messier than putting something in /etc/gitlab/gitlab.rb
.
Workaround
- Define
pool.ntp.org
in /etc/hosts - It'll be necessary to define IP addresses, but to partly mitigate this, it should be possible to define multiple entries, in the same way that the NTP client would use multiple time sources.
- If one of the NTP servers is shut down, the check is likely to fail intermittently.
Steps to reproduce
Configure /etc/hosts to point pool.ntp.org to a local IP address that doesn't serve NTP:
# sudo gitlab-rake gitlab:geo:check
Checking Geo ...
GitLab Geo secondary database is correctly configured ... not a secondary node
Database replication enabled? ... not a secondary node
Database replication working? ... not a secondary node
GitLab Geo HTTP(S) connectivity ... not a secondary node
GitLab Geo is available ...
GitLab Geo is enabled ... yes
This machine's Geo node name matches a database record ... no
[snip]
HTTP/HTTPS repository cloning is enabled ... yes
Machine clock is synchronized ... Exception: No route to host - recvfrom(2)
[output continues]
.. or doesn't respond:
# sudo gitlab-rake gitlab:geo:check
Checking Geo ...
GitLab Geo secondary database is correctly configured ... not a secondary node
Database replication enabled? ... not a secondary node
Database replication working? ... not a secondary node
GitLab Geo HTTP(S) connectivity ... not a secondary node
GitLab Geo is available ...
GitLab Geo is enabled ... yes
This machine's Geo node name matches a database record ... no
[snip]
HTTP/HTTPS repository cloning is enabled ... yes
Machine clock is synchronized ... Exception: Timeout::Error
[output continues]
Example Project
What is the current bug behavior?
Machine clock is synchronized
check fails
What is the expected correct behavior?
Machine clock is synchronized
check can be made to work in a way that doesn't involve changing /etc/hosts
We added this check because of the importance of time syncing to Geo.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)