SGI csh glob vs. NeXT NFS (SGI 3.3 client, NeXT 1.0 server)

Wed Sep 26 13:57:00 AEST 1990

In article <1990Sep26.003327.9076 at cs.umn.edu>, slevy at geom.umn.edu (Stuart Levy) writes:
> We're finding that csh (and tcsh) file pattern globbing fail on our SGI 3.3
> and 3.3.1 machines when applied to an NFS-mounted NeXT (1.0) file system.
> "echo *" yields "echo: No match", attempts at file-completion just beep, etc.
> [. . .] 
> Cases where this *doesn't* happen, i.e. globbing works properly:
>   - SGI csh globbing works fine on local file systems,
> 	and on fs's NFS-mounted from Sun servers.
>   - SGI ls, sh and ftpd globbing work fine on all file systems.
>   - Compiling the Berkeley glob.c from uunet, which uses readdir(),
> 	yields something that works on all file systems.

Strange -- does this glob.c include <sys/dir.h> or <dirent.h>?  If it uses
the BSD-compatible <sys/dir.h>, it should fail just like csh fails.  SGI 3.3
libc contains both AT&T-style directory(3C) routines, with entry points named
as in the man page (readdir, etc.); and 4.3BSD-flavor directory(3B) routines
named by prefixing the documented names with "BSD" (SGI's <sys/dir.h> renames
unprefixed calls with some magic #defines).  Only the BSD-compatible readdir
should fail as described.

>   - Suns mounting the same NeXT fs can glob on it too.
> 
> Poking at the NFS protocol messages, it looks like SGI's csh is reading
> directories in 512-byte chunks, while the library readdir() reads in 8K units.

Csh calls readdir(3B), which (following 4.3BSD) asks for DIRBLKSIZ or 512
bytes of entries per system call.  We should probably increase this to 4k
(not 8k -- you're looking at NFS's read-transfer/buffer-cache size) to be
consistent with libc readdir, as well as for performance.  4.3-reno uses a
DIRBLKSIZ of 1024; I can't tell what SunOS uses, but it must be > 512 if
a Sun client works with a NeXT server.

> Apparently 512 bytes is too short for the NeXT; its NFS server responds with
> EINVAL (invalid argument).  The SGI kernel doesn't report the error to csh,
> just gives a 5-byte (!) result from the getdents() system call which csh
> interprets as an empty directory.

Not so: the kernel does return an error, but not EINVAL.  SGI clients map
all NFS status codes not defined by the NFS protocol to EIO when converting
from status to errno (NFS enumerates a subset of BSD/SunOS error numbers as
well-defined status codes).  Note that EIO's value is 5 -- is this the five
referred to in "a 5-byte (!) result"?

The bug or feature, take your pick, is in csh: in sh.glob.c (and in fact in
the other two places where directories are read), no error checking is done.
Csh silently treats a directory read error as EOF.

But the original bug is that NeXT's vnoded BSD filesystem (ufs), like Sun's
NFS reference ports up until NFSSRC4.0, rejects attempts to read fewer than
DIRBLKSIZ (system-specific, apparently 1024 on a NeXT) bytes of directory
entries, returning EINVAL.  Such broken ufs_readdir implementations also
reject attempts to read at a non-DIRBLKSIZ-congruent offset.  Sun fixed the
bug in NFSSRC4.0, allowing "sub-atomic" read size and truncating the offset
to a DIRBLKSIZ boundary.  NeXT should have this fix by now.

Brendan Eich
Silicon Graphics, Inc.
brendan at sgi.com