do we need (or want) context in grep and diff?

Paul Gluckauf Haahr haahr at phoenix.Princeton.EDU
Thu Jun 16 08:33:20 AEST 1988


there has been quite a lot of discussion following research!andrew's
outline of gre (and his comment that v9 diff no longer has -c) on
whether grep and diff should provide context.  my two cents worth follows.

context printing might not be hard with grep.  it may also require grep
to do some extra work that it doesn't do right now, that is, figuring
out the previous n lines.  from andrew's description of the one limit
that exists in gre (64k lines), and some knowledge of how one does
a boyer-moore style grep (find the pattern then backup to find the
beginning of the line rather than searching each line individually)
leads me to believe that getting context right, at least in the presence
of very long lines, isn't as easy as one would expect.

on context, by the way, i can't think of a simple definition.  if i
am printing a context of 2 (2 lines before and after, plus the given
line), what should a context grep (assume each character is a line)
for X in the file "abcXdXefgXhi" print? "bcXdX XdXef fgXhi" or "bcXdXefgXhi"?
and should the matched line be flagged, ala diff -c?  i like having this
in a separate program which could handle those questions as options.

for handling pipe, yes it does mean putting the input somewhere.  is that so
awful?  remember, context grep output would largely be for humans, and in this
case i don't see the harm in using /tmp (besides, if it's small, it will all
be in the buffer cache anyhow).  here is a quickly hacked cgrep:

	#! /bin/sh
	case $# in
	0|1)	echo >&2 "usage: cgrep nlines pattern [file...]"; exit 2;;
	esac
	n="$1"
	pattern="$2"
	shift; shift
	case $# in
	0)
		tee /tmp/cgrep$$ |
			grep -n "$pattern" |
			context "$n" /tmp/cgrep$$
		rm -f /tmp/cgrep$$
		;;
	*)
		for i
		do
			grep -n "$pattern" "$i" | context "$n" "$i"
		done
		;;
	esac

ideally it would do some argument processing and pass that along to
grep or context and a default nlines.  context is easy and left as an
exercise for the reader, but several more than capable ones have been
floating around over the past couple of days.  anyway, this is just an
outline.

on the argument for whether or not diff should support context, first
note that the problem of input from a pipe being lost is a chimera:
all a context diff produced needs is diff output and one of the
filenames (and knowledge of whether that was the first or second file
from the diff command line).

[ that is for conventional diff.  the diff i run supports input from
two pipes (by enclosing the filename in parentheses it is passed as a
command to sh -c, allowing
	$ diff '(first command)' '(second command)'
works rather nicely.  i've been thinking of hacking together a shell
that takes (command) as a command argument and changing it to a named
pipe or /dev/fd/n and executing the command at the other end of the
pipe.  rm '(cat file)' might isn't what one really expects.  i heard
that one of the research shells, maybe it was one of korn's supported
this, but never saw it in a release.  and there are very few commands
which this would be useful with other than diff, although awk -f
(command) might be nice every once in a while).  by the way, my diff
doesn't have -c, and i haven't really missed it.  but i don't send out
patches often. ]

for those of you who are complaining about writing a /tmp file, look
at what diff does if it's input is from a pipe.  if we had good vm
implementations, diff could get away with reading everything into core.
alas, we don't, and that would cause large files to make diff thrash.

my personal feeling is that it is probably not harder for any reasonable
diff to handle context, and there are enough programs that look for context
diffs that it isn't unreasonable to keep it there.  patch fudge factors
sound like a good argument for keeping them around, though a diffc might
provide a good place to start for playing with new, more readable forms
of context diffc, allowing some new creatures to feep into context diffs
(standout mode or nice output for troff, side-by-side format)

on the other hand, there is very little reason that i see to put context
in grep, and a good context tool would probably make us all forget that
anyone had ever asked for it.

but then again, i'm from the school of thought that doesn't use cat -v.
[ by the way, cat -v would be useful if it gave a one to one mapping of
it's output.  but, if your file contains "M-" or "^", it won't.  my own
vis program is reversible (with an unvis) that is useful for editing
binary files (ever needed to change a hard coded pathname?) though, of
course, many editors handle binary files ok ]

paul haahr
princeton!haahr or haahr at princeton.edu



More information about the Comp.unix.wizards mailing list