Why is find so slow (Re: Why use find?)

Wed Oct 10 14:39:21 AEST 1990

In article <F6.rhxm2 at cs.psu.edu> flee at guardian.cs.psu.edu (Felix Lee) writes:
>"descend" is fast because it recognizes leaf directories and avoids
>stat()ing the files in that directory.  This is usually a big win,
>since most files tend to be in leaf directories.
>
>"find" can't do this in general, since most of its predicates require
>stat()ing each file, but it wouldn't be too hard to add lazy stat()ing
>to find.  And it may even be worth it.

Quite likely.

4.3BSD-reno's `find' reimplementation (a redistributable version) uses
the C library `fts' routines (from POSIX) which take flags indicating
whether `stat's are desired.  If you say `no', it stats only when this
is required to search the directory.  The new find sets the `I need
stat information' only when at least one predicate requires it.  The
result:

	% cd /usr/src/local/games
	% time find . name obj
	./umoria/obj
	0.3u 0.8s 0:01 91% 35+56k 0+0io 2pf+0w
	% time find . type l
	./umoria/obj
	0.5u 5.2s 0:09 63% 42+66k 94+1io 3pf+0w
	% 

It makes a pretty big difference.  If find said `no stat' and did the
stats only when some predicate actually required it, that would help
make things like

	find . \( name foo or name bar \) and type l

run much faster.  Uncommon?  Not really:

	find / \( fstype local or prune \) \( \
		   \( name '[#,]*'				atime +1 \) \
		or \( \( name '*.bak' or name '.*.bak' \)	atime +3 \) \
		or \( name '.emacs_[0-9]*'			atime +7 \) \
		or \( name core						 \) \
	\) print exec /bin/rm -f {} \;

Incidentally, yes, find still accepts the old syntax (find . -name ...).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris at cs.umd.edu	Path:	uunet!mimsy!chris