rwhod, creat slowness

Bill Sommerfeld wesommer at mit-eddie.UUCP
Fri Feb 14 08:33:09 AEST 1986


The rwho daemon is well known for creating n^2 scaling problems on large
nets.  It uses up a lot of CPU keeping effectively identical copies of
the files in a directory on all systems on the local net.

In article <789 at brl-smoke.ARPA> speck at vlsi.caltech.edu (Don Speck) writes:
><flame>
>    About a month ago I discovered that 80% of all disk I/O
>done on our Suns was the single, simple line (in rwhod.c):
>
>	whod = creat(path, 0666);
>
>where path = "/usr/spool/rwho/rwhod.%s" (%s = hostname).
>
>    How could this innocent-looking line be such a hog?
>
>1)  Each machine executed it 18 times per minute (we have
>    18 rwhod's running on one net)

There is a simple solution to this, which requires a small fix to
rwhod.  Modify it so that it accepts a -n option (for "no write to disk").
You then modify the loop such that it doesn't do that creat() and
write() when the -n option is set.

>2)  All those directories had to be looked up each time
>3)  On Suns, /usr/spool is a symlink to /private/usr/spool,
>    adding another 3 directories to be looked up
>4)  On Suns, /usr and /usr/spool sit on a Network FileSystem.
>    Sun's NFS has no caching in the clients; each lookup
>    requires a server transaction over the network
>5)  14 Suns used the /usr network filesystem
>

You can then set things up so that /usr/spool/rwho on all machines
points to the /usr/spool/rwho of the server, and modify all but the
server to run rwhod -n in /etc/rc.  The only time that remote I/O is
needed is when someone does an rwho or ruptime to find out what's going
on.  

>Why is creat(), probably one of the top 10 system calls, so
>slow on 4.2bsd systems?  Why is ftruncate just as slow - and
>still takes 30ms even if the file is already the correct size?
>Apparently these system calls do *synchronous* I/O, ignoring
>the buffer cache (even on plain VAX 4.2bsd, without any NFS
>clouding the issue).
>
They do synchronous I/O so that the filesystem is not corrupted in
uncontrolled ways when a system crashes.  This simplifies fsck's job.


				Bill Sommerfeld
				MIT Project Athena

				wesommer at athena.mit.edu
				mit-eddie!mit-athena!wesommer



More information about the Comp.unix.wizards mailing list