ntpd takes 10 minutes to retry if DNS is down

As reported at: http://bugs.debian.org/924192

Here’s a reproducible experiment that seems to demonstrate the problem.

Open two ssh sessions to the machine running ntpsec that you want to demonstrate.

On the first session start up something like this:

    ( set -x ; while :; do
      date ;
      ntpq -p ;
      ntpstat ; 
      ntpq -c kern | grep ‘^pll’ ;
      cat /var/lib/ntpsec/ntp.drift ; 
      echo ; sleep 15; done
    )> /tmp/ntpq.out 2>&1

On the second session start up something like this:

    ( set -x ; 
      date ; 
      service ntpsec stop ; 
      sleep 15 ; 
      mv /etc/resolv.conf  /etc/resolv.conf.SAVE ; 
      sleep 15 ; date ; 
      service ntpsec start ; 
      sleep 120 ; 
      date ; 
      mv /etc/resolv.conf.SAVE /etc/resolv.conf ; 
      sleep 700 ; 
      date ; set +x 
    ) > /tmp/control.out 2>&1

When the dust has settled do:

    journalctl -b | grep ntp > /tmp/journal.out

Edit away the parts of journal.out that occur before and after the experiment…

And the attached files are what you get.

What seems to be happening is that if DNS is not immediately available when ntpsec starts, it waits about 10 minutes before trying again. Ten minutes is too long. What ideally I’d like to see happen is this: as long as either minsane or minclock is not satisfied and there are pool statements that have not been resolved, retry each and every outstanding pool statement with an exponential backoff — e.g. after the first failed attempt wait 10 seconds and retry, if that fails, wait 20 seconds, if that fails wait 40 seconds, and so on.

control.out

journal.out

ntpq.out

Edited Mar 14, 2019 by Richard Laager