Fast file scan (long)

Bill Stewart 201-949-0705 erebus.att.com!wcs wcs at cbnewsh.att.com
Sun Oct 21 10:25:04 AEST 1990


In article <1990Oct17.183310.29684 at celebr.uucp>, jbm at celebr.uucp (John B. Milton) writes:
> >In article <299 at lysator.liu.se> pen at lysator.liu.se (Peter Eriksson) writes:
> >>I`d like to know how to scan all files in a directory (and it's sub-
> >>directories) for a specific string (without regular expressions) as fast
> >>as possible with standard Unix tools and/or some special programs.
> And the winner is:
> find . -type -f -print | xargs grep SEARCH
> Use whichever grep works best (fastest, does what you want, etc.)

For this case, you probably want a grep based on the Boyer-Moore
pattern-matching algorithm; several were posted to *.sources a few years ago.
Essentially, what B-M does is preprocess the search pattern, and
start looking in the text where the END of the pattern would be, e.g.
	Pat.:	foobarbaz
	Ptr:            |
	Text:	Now is the time for all good parties to foobarbaz to an end.
			 foobarbaz
The 'h' in the text doesn't match the 'z', and since there are no 'h's
anywhere in the pattern, you can advance the pointer 9 positions,
avoiding a lot of comparisons.  The next look gets you:
	Pat:		 foobarbaz
	Ptr:	                 |
	Text:	Now is the time for all good parties to foobarbaz to an end.
'o' doesn't match 'z', but IS in the pattern, so you advance by 6.
	Pat:		       foobarbaz
	Ptr:	                       |
	Text:	Now is the time for all good parties to foobarbaz to an end.
And so on, until you find it or reach the end.
-- 
					Thanks; Bill
# Bill Stewart 908-949-0705 erebus.att.com!wcs AT&T Bell Labs 4M-312 Holmdel NJ
Government is like an elephant on drugs: It's very confused, makes lots of noise,
can't do anything well, stomps on anyone in its way, and it sure eats a lot.



More information about the Comp.unix.misc mailing list