Proofreading documents with awk

Arnold Robbins arnold at audiofax.com
Thu Dec 21 07:23:19 AEST 1989


:In article <25 at meme.stanford.edu> heit at psych.Stanford.EDU (Evan Heit) writes:
:: I am looking for someone who has written a program in awk that will
:: will allow me to proofread my papers by by looking for word repetitions.

In article <6612 at jpl-devvax.JPL.NASA.GOV> lwall at jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>How about filtering through
>
>    tr -cs "A-Za-z" "\012" | uniq -d
>
>(Sys V'ers will have to make that [A-Z][a-z]).
>
>I sincerely doubt that any awk (or perl) solution will do as well.

Well, yes and no.  The following should work in GNU Awk and possibly
the V.4 nawk.  It is untested though.  Its advantage is that it
provides line number and file name information.

	#! /path/to/gawk -f

	{
		gsub(/[^A-Za-z0-9 \t]/, "");	# delete non-alphanumerics
		$0 = tolower($0)		# go to all one case
		if ($1 == last)
			printf "Duplicate '%s' line %d, file %s\n",
				last, FNR, FILENAME
		for (i = 2; i <= NF; i++)
			if ($(i-1) == $i)
				printf "Duplicate '%s' line %d, file %s\n",
					$i, FNR, FILENAME
		last = $NF
	}

As Jeff Lee points out, this IS slower than the tr | uniq solution.
-- 
Arnold Robbins -- Senior Research Scientist - AudioFAX | Laundry increases
2000 Powers Ferry Road, #220 / Marietta, GA. 30067     | exponentially in the
INTERNET: arnold at audiofax.com	Phone: +1 404 933 7600 | number of children.
UUCP:	  emory!audfax!arnold	Fax:   +1 404 933 7606 |   -- Miriam Hartholz



More information about the Comp.unix.questions mailing list