Timeslave question

Fri Feb 15 13:14:31 AEST 1991

In article <1991Feb14.174728.16734 at nntp-server.caltech.edu>, ktl at wag.caltech.edu (Kian-Tat Lim) writes:
> We are using timeslave to track an NTP primary.  Assuming that the
> 'err' column in the debugging output is in units of milliseconds, we
> seem to be off by as much as 1/4 second at times, both fast and slow,
> though we're usually under 150 ms.  I have two questions:

It's all milliseconds.  250 msec seems rather high.  Is the system
suffering kernel printf's?

> 1) Why doesn't timeslave do a better job?  The NTP host appears to
> adjust its time by steps of 10 ms, as well as adjusting its clock
> rate.  timeslave seems to have a hard time following the 10 ms jumps.
> Should I be fiddling with some of the timeslave options to get better
> tracking?  (It doesn't look like there's any control over the internal
> filtering algorithm, unfortunately.)  I have also observed oscillatory
> behavior, in which timeslave will overshoot the zero error point
> (often by as much as it was off before) over and over.

Timeslave assumes its target is perfect.  It was written originally to sync
the SGI network to another company's with a satellite receiver, across a
9600 b/s Cypress link.  The link was heavily loaded, esp.  by netnews.
While loaded, network delays could be assymmetric by as much as 4 seconds.

If the measurements look good enough for long enough, a jump will look like
network problems and be discarded.  If the difference persists, timeslave
will start opening the filter.

"Filter" is doubtless too fancy a word for how timeslave tries to partition
the measured error among long term drift, short term jumps, and errors in
the measurements.  At each adjustment, it changes the clock by one 32nd of
accumulated differences between the expected and measured errors of the
last 32 measurements (using 32 buckets to get the exact sum of the last 32
measurements) plus the current estimate of the drift.  The drift is
estimated by summing the last 8 hours of adjustments to the local clock.

> 2) Is there a bug in the timetrim computation?  It seems strange that
> negative drifts (and -xx/yy sec/hr claims) end up with positive
> timetrim values.

The timetrim values are simply the total of all adjustments divided by
total elapsed time, and the same for that last 24 hours.  The difference in
the short and long term values suggest that strange things happened in the
past.  Kernel printf's commonly mess up time on IRIS's because the disable
all interrupts.  Disk errors, disk full, and late collision message are
common culprits.  I've heard reports of difficulties with *ntp on trashed
networks from exactly this sort of problem.

There were bugs in the timetrim computations, fixed I think in 3.3.2.
As I recall, it used int's and suffered over or underflow.

> Maybe I should just give up and install xntp, but I was trying to get
> decent time without the headaches of using yet another
> vendor-unsupported package.

If you want really good time, that might be a good idea.  Increasing the
measurement rate to '-r 15' would probably make timeslave much more
accurate at modest cost.

I've seen nearby *ntp machines claiming to be 3 msec from UTC, but
according to my measurements using ICMP timestamps close to a second away.
I've inferred that that the accuracy reported by the *ntp deamon is that
which would be achieved if only the deamon could adjust the computer's
clock as it wished.  In other words, ntp appears to report as its accuracy
the difference between the measured error and the predicted error.  Is this
a correct inference?

Vernon Schryver,   vjs at sgi.com