grep

Alex Martelli alex at am.sublink.org
Wed Apr 17 08:54:08 AEST 1991


akbloom at aplcen.apl.jhu.edu (Keith Bloom) writes:
:mmoore%hellgate.utah.edu at cs.utah.edu (Michael Moore) writes:
:>	Does anyone know if there is an easy way to recursively search for a 
:>pattern down the entire file tree of a directory?  
:If your system has xargs, you could try:
:find . -name '*' -print | xargs grep pattern
:If you have a huge directory tree with thousands of files in it, this
:may not work.

Why not, pray?  xargs is supposed to chop its stdin into pieces that
are short enough that they can be passed as arguments to the target
command, here 'grep pattern'.

A slight improvement: use "grep pattern /dev/null" as the target 
command; by making grep look into more than one file, it will print
the name of the file where the pattern is found (in the original,
if grep happened to be called with just one file, for example at
the very end of the search process, it might find lines and print
them out without identifying where they came from).

A second improvement: omit the -name "*"; all it's doing is not
making grep look into files whose names start with a dot; and why
wouldn't you want to grep inside .netrc, for example?

A third improvement: add a -type f flag; avoid grepping into
directories by mistake, and particularly avoid grepping into
device-files - grepping into /dev/tty, for example, can hang the
procedure until EOF is forced on the terminal...

There are many other things one might wish to do (for example,
only grep into files which are readable by you), but find does not
support them easily.  Unfortunately, some grep's will just fail if
ONE of their target files is unreadable - and not even bother looking
into the other ones!

The best fix for this specific problem is probably to also attack
another desideratum - NOT grepping into non-text files.  The "file"
command, on many systems, will emit a description containing the
keyword "text" for a text file (in variations such as "English text",
"ascii text", etc), but not for non-text files (it will say "data",
or describe the type of executable, etc), and for non-readable files
it will say something like "cannot open for reading" [if you're
unlucky enough that your "file" command says, for example, "sh
commands" instead of "sh command text" for a shell script, you will
have to get a little more fancy in the following, but the basic
idea still apply).
So, we want to xargs the files emitted by find, first into file,
then remove all non-text ones, and finally grep on the remainder
only; we can both select for "text", and remove the descriptions,
at one gulp with, for example, sed.

find . -type f -print |
	xargs file |
	sed -n '/:.*text/s/:.*//p' |
	xargs grep pattern /dev/null 

This is still NOT perfect - filenames containing newlines will
typically give problems with any find ... -print | xargs (one should
use find ... -print0 and matching xargs -0, if lucky enough to have
them, for example GNU versions of find and xargs), and here the 
further trip through file and sed will further mess things up if
the filename contains a colon (and is a text file, or has the string
"text" in the filename after the colon); one COULD get fancier, with
a sed expression to exclude lines with two or more colon characters,
but it's getting a bit late at night for me to figure out how to handle
a filename with such as "joke: ascii text\nfooled you!" even with the
-print0 and -0... there is a point of diminishing return where perl
gets simpler than this sort of thing...:-).

:If you don't have xargs, there's:
:
:find . -name '*' -print -exec grep pattern {} \;
:
:but this is more cumbersome, because it will print the names of all 
:your files, whether they contain the pattern or not.  (I assume you
:want to know the name of the file that 'pattern' is in.)

You can omit the -print and just have /dev/null as an argument to
grep just after the pattern, as I suggested above.  It's still 
"more cumbersome" in the sense of overloading your CPU, since a
fork and exec is done for each file, rather than processing them
en masse via xargs... still, my suggestions about removing the
'-name *' and inserting a '-type f' would also apply here.

-- 
Alex Martelli - (home snailmail:) v. Barontini 27, 40138 Bologna, ITALIA
Email: (work:) martelli at cadlab.sublink.org, (home:) alex at am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).



More information about the Comp.unix.questions mailing list