Fast file scan (long)

John B. Milton jbm at celebr.uucp
Thu Oct 18 04:33:10 AEST 1990


In article <1990Oct2.192028.29731 at sco.COM> georgn at sco.COM (Georg Nikodym) writes:
>In article <299 at lysator.liu.se> pen at lysator.liu.se (Peter Eriksson) writes:
>>I`d like to know how to scan all files in a directory (and it's sub-
>>directories) for a specific string (without regular expressions) as fast
>>as possible with standard Unix tools and/or some special programs.
>>
>>(I've written such a program, and would like to compare my implementation
>>of it with others.)
>>
>>(Using the find+fgrep combination is slooooow....)
>
...
>	fgrep "SEARCH_STRING" `find . -type f -print`
...
>	FILES=`find . -type f -print`
>	for FILENAME in $FILES
>	do
>	  fgrep "SEARCH_STRING" $FILENAME
>	done
...
>	find . -type f -exec fgrep "SEARCH_STRING" {} \;
...
>	dirlist=`find . -type d -print`
>	for dir in $dirlist
>	do
>	  fgrep "SEARCH_STRING" $dir/*
>	done

And the winner is:

find . -type -f -print | xargs grep SEARCH

Use whichever grep works best (fastest, does what you want, etc.)

It will work on an unlimited number of files. The fork/exec of the grep is not
too bad (xargs does NOT build 5k arg lists, but rather 470 character, so that
part could be improved). The grep will put out file names with each match,
which the last example won't do. The list of files is easily reduced with
filters between the find and the xargs (grep -v spool/news). All the find
qualifiers are available. You can watch the progress with ps -f. If there are
a lot of files in a directory, the $dir/* in the last example will blow up.
Watch (") on grep args, as shell wildcards are expanded inside ("), but not (').

Another fun one is to avoid binary executables, which can slow down grep a lot:

find . -type -f -print | xargs file | grep -v ':.*executable' | cut -d: -f1 |
  xargs grep SEARCH

You may have to tune the "executable" to whatever file(1) puts out. You could
also add a "! -name '*.[aZz]'" to the find to avoid more non-text stuff.

John

-- 
John Bly Milton IV, jbm at uncle.UUCP, n8emr!uncle!jbm at osu-cis.cis.ohio-state.edu
(614) h:252-8544, w:469-1990; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!



More information about the Comp.unix.misc mailing list