FileNames with the high bit set.

Guy Harris guy at gorodish.Sun.COM
Mon Apr 11 10:38:31 AEST 1988


> On our 4.3+NFS (Mt. Xinu) system on a Vax780 and also on a Sun 3/60
> running SunOS 3.5, open(2) and creat(2) return EINVAL if the pathname
> supplied to them has a character with the high order bit set.
> 
> Why is this ? Has this behaviour been added by Berkeley Unix or has
> it "always" been there in Unix ?

It was added in 4.2BSD.

> Is it because sh(1) uses the parity bit for it's own purposes and the
> kernel does not want to create files that the shell might not be able
> to handle in this manner ?

In addition to pre-S5R3 "sh", the C shell also uses the parity bit for this.
The 8th bit stuff was probably thrown in for precisely the reason you list.

> In any case, this seems like an arbitrary restriction.

It is.

> I can imagine applications which might want to create files that have
> names with arbitrary bytes in them (if you used a hashing function
> on some key to come up with a filename, you can get an "invalid"
> pathname).

Hell, I have a symbolic link to "/vmunix" on my machine named "/UNIX(r)", where
"(r)" refers to the ISO Latin #1 "registered trademark" character, which has
the hexadecimal code 0xAE.  SunOS 4.0 removed the restriction in question; it
uses the S5R3 Bourne shell as its Bourne shell, and that shell doesn't have
problems with file names containing 8-bit characters, so if you have files like
that lying around "rm -i *" (or "rm -i .*" if the file name begins with ".")
can clean them up from the Bourne shell.  The 4.0 C shell still can't handle
filenames such as that; this is a restriction we currently plan to lift in a
future release.

Creating file names containing arbitrary character codes is probably not a good
idea; if you have an OS and file system that allow you to create very long file
names, you should use that capability.  The reason we removed the restriction
was not so that you could create files with binary names; it was as a first
step towards supporting larger character sets than ASCII, such as the ISO 8859
chraracter sets and the various EUC-derived Asian character sets, in file
names.

(BTW, you *can't* create files that have names with truly arbitrary bytes in
them; '/' and '\0' are not valid in UNIX file names - '/' separates *file*
names in a *path* name, and '\0' terminates a path name.)



More information about the Comp.unix.wizards mailing list