accurate runtime accounting (was Load Avarage graph pattern)

Wed Jun 19 03:59:21 AEST 1991

In article <14398 at dog.ee.lbl.gov> torek at elf.ee.lbl.gov (Chris Torek) writes:
>In article <14081 at dog.ee.lbl.gov> I noted that Unix CPU accounting is
>generally fairly poor, and wrote:
>>>The solution is simple but requires relatively precise clocks. ...
>
>In article <1991Jun12.130441.20640 at fccc.edu> Stodola says:
>>One of my associates and I did a study of this a number of years ago
>>(actually it was with a PDP-11/70 running IAS).  We found that there
>>was substantial clock synchronized usage on the system.  The solution
>>we found didn't require very precise clocks at all.  Simply one whose
>>rate was relatively prime to the system clock.
>
>This works well in a number of situations, but I believe it will miss
>short-lived processes on modern (fast) machines.  Unix boxes generally
>run their scheduling clock in the range 50..500 Hz.  Some of these have
>CPUs that run 40 million instructions per second; some things take only
>a few thousand instructions, and it seems intuitively obvious% that
>they might `slip through the cracks'.  [%This is research-ese for `We
>did not try it out but we wrote a paper on it anyway.' :-) ]
>
>In other words, I think `PDP-11/70' may be an important constraint
>above.  A relatively prime profiling clock is likely to work well on
>   [More deleted]

I guess I should have explained the context of the project.  The purpose was
to obtain accurate usage information on a per user basis, and provide good
average load statistics.  If the context switcher itself doesn't keep this
info using a very accurate clock (ie. a non-interrupting read-only clock with
megaHz resolution), you can't ever accurately measure this [actually, we
kicked it around at lunch and tossed out some silly ideas for having another
machine on the bus counting instructions, but the conversation quickly
deteriorated from there].  In this context, the speed of the clock is
less important than its lack of synchronization with the system clock.  Those
thousand instructions (taking 1/40000th second on the machine you have
postulated) has a one in 500 chance of being interrupted by a, say, 80Hz
clock.  So when you see it, you score it with an 80th of a second.  Since you
miss it 499 other times, you get it right on the average.  That is, for every
10000 times the code runs, you see it 20 times, and score it with 1/80th
of a second each time (20 * 1/80 = 10000 * 1000/40000000).  Speeding up the
clock merely improves the variance for a given number of samples, but doesn't
effect your ability to see a short sequence in a statistical sense.  Obviously,
if you need to know EXACTLY how many cycles were used in a PARTICULAR clock
tick, or EXACTLY how many cycles a PARTICULAR process used in a PARTICULAR
tick, this method doesn't do it.  The importance of the statistical method
of measurement is that you avoid the rhythms imposed by the system clock
entirely.

[BTW - we both tried it and wrote the paper :-) ]
-- 

stodola at fccc.edu -- Robert K. Stodola (occasionally) speaks for himself.