looking for info on builtin hashing of command in csh

Chris Torek chris at mimsy.UUCP
Wed Mar 22 20:36:40 AEST 1989


In article <111 at VAX1.CC.UAKRON.EDU> tim at VAX1.CC.UAKRON.EDU
(Timothy H Smith) writes:
>	Im looking for a very specific reference of how the builtin
>hash function for csh works. ... I am able to creat a new executable
>file and csh will find it, but if a move a file to a different location
>csh cannot find it.  I then have to do a rehash.

No and yes.  [Now I will see if seanf is reading this.]

>I believe that csh looks at the location of where it thinks the
>file is in it's hash table,  and when it's not there it complanes.
>Is this true?  How does it know the difference between a newly 
>created file and one not in its hash table.  Does csh check that path
>if the named executable is not in its table?

Forget everything you think you know about csh's hashing.

The C shell uses a technique whose name I either never learned or
have forgotten.  It works like this:  Choose ten or fifteen hash
functions, each of which map a pair ({"/bin", "ls"}, {"/bin", "cat"},
{"/usr/ucb", "rlogin"}) to a number in $0..k-1$.  Now make an array
of $k$ bits, and do the following:

	foreach directory in $path
		foreach file in $directory
			array[hash_0($directory, $file)] <- 1
			array[hash_1($directory, $file)] <- 1
			array[hash_2($directory, $file)] <- 1
				...
			array[hash_9($directory, $file)] <- 1
		rof
	rof

Now, if given a command name without a `/', you can decide whether it
*cannot* be in some directory.  So instead of trying everywhere, you
instead say:

	foreach directory in $path
		if array[hash_0($directory, $command)] = 1 and
		   array[hash_1($directory, $command)] = 1 and
		   array[hash_2($directory, $command)] = 1 and
			...
		   array[hash_9($directory, $command)] = 1 then
			if execute($directory/$command) fails then
				if errno = ENOENT then continue
				else complain about not-executable
				fi
			fi
		fi
	rof
	complain about not-found

What this does is eliminate some (but not all) attempts to execute
a nonexistent file.

The smaller the constant $k$, and the fewer hash functions $h_i$ you
apply, the more likely you are to find `false hits', where a string
like `uItrfZgnby' happens to have all $i$ array[$hash_i$()] bits set.
The C shell was written to run on small PDP-11s, so it uses a small $k$
(8192) and a single hash function (!).  (It saves time and code space.
The function hash_0() operates on the *index* of the path component and
a true hash of the trailing command part; the real hash need only be
computed once, and hash_0() is then a macro.  To make things better,
the path index is used to select which of 8 bits is set, so that bit
collisions cannot happen between path components, if $#path <= 8.
This keeps csh from trying too many wrong directories.)

My own path (which has more than 8 components, so I am not saved by
that last trick) produces over 1400 bits that would have to be set;
assuming a no-collision distribution, there is then a 1400/8192 chance
that a random string would appear to be runnable via some directory.
>From my own experience, I would guess that csh's hash function produces
many collisions for my path and the strings [a-z]+, so that while fewer
than 1400 bits are set, the chances of a random string producing at
least one exec() call is better than the 17% suggested by the above.

[Warning: there is a four-letter word in the following story.  You
can stop reading now if such things offend you.  There is a control-L
on the next line to stop pagers here.]

Incidentally, csh's treatment of errors above is also not the best.  We
used to turn off access to /usr/games from 0900 to 1700 (or something
like that), and if you logged in before cron turned them off, csh would
set bits for the various games.  If you then tried to run one later
(rogue, for instance), it would attempt an exec(/usr/games/rogue) and
get EACCES, and print `/usr/games/rogue: Permission denied.'  All well
and good, until you tried to run `blorfloogle' or something else
nonexistent, and csh thought it might be in /usr/games.  It would
attempt an execl(/usr/games/blorfloogle) and get EACCESS, and assume
that this meant the file existed, rather than that the directory was
off-limits.  This really confused someone who had gotten mad about
something and gave the command `fuck off'.  It just happened that he
had gotten the appropriate bit set, and csh printed: `/usr/games/fuck:
Permission denied.'  He was rather disappointed to find out that there
really was no such game after all.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris at mimsy.umd.edu	Path:	uunet!mimsy!chris



More information about the Comp.unix.wizards mailing list