Puzzled by A Regexp...

Thu Mar 7 08:58:30 AEST 1991

In article <10469 at ncar.ucar.edu> tres at virga.rap.ucar.edu (Tres Hofmeister) writes:
>
>	I've run across a regular expression that I don't quite understand.
>Not that this hasn't happened before, but this seems like it should be
>fairly straightforward...
>
>	I'm trying to match entries in /etc/group which have one or more
>members.  The following works just fine, matching each of the colon
>delimited fields individually followed by one or more characters:
>
>	grep '^.*:.*:.*:..*' /etc/group

This one will find any line with three or more colons with a character
of any type after colon-number-three-or-higher.  This re means

	From start of line
	zero or more of any characters
	a colon
	zero or more of any characters
	a colon
	zero or more of any characters
	a colon
	any single character
	zero or more of any characters

It'll match good group entries and

	:::::
	:::.:   ::: --:
	::::
	a:::a
	:::a

>	What I don't understand is why the following doesn't work the same
>way:
>
>	grep '^.*:..*' /etc/group

This one will find any record that includes a : before the last char in 
the line. The re means

	From start of line,
	zero or more of any characters
	a colon
	any single character
	0 or more of any characters

It matches the following

	::
	:::a
	:b:::
	:b
	gigo:1123

>	It grabs entries with one or more members, true, but also grabs
>entries with no members, e.g. "news:*:6:".  I figured that this regexp
>would match the longest possible string at the beginning of a line,
>terminated by a colon, which in the group file should include the first
>two colons, followed by at least one character.  It seems to be doing
>something else, given that it will also match a line with no members.

The only lines it *won't* match are those with no colons or where the
only colon in the line is the last character.  What it's looking for
is a line with a colon followed by a character.

>	Any ideas?

Instead of    .*   in there, on the first (field matching) version:

	grep '^[^:]*:[^:]*:[^:]*:..*' /etc/group

Even better for the second example is to anchor at the END instead of the
BEGINNING of the data lines:

	grep ':[^:]+$' /etc/group

will match any line with at least one non-colon character following the
last colon in the line.  Alternatives that are the same:

	:[^:]\{1,\}$
	:[^:][^:]*$
	^.*:[^:][^:]*$

Finally, any line not matching the following is either a group with no
members or a badly-formed line in the file

	^[^:]+:[^:]*:[0-9]+:[^:]+$

which matches

	From start of line
	at least one non-colon
	a colon
	any number of non-colons
	a colon
	a decimal number
	a colon
	at least one non-colon
	end of line

Note that it won't see other anomolies like a group with too big a gid
(system dependent and we can't check to see if it's 65536, for instance,
if 65535 is the biggest) or usernames that are too long or weird stuff
in the userids field (we could exclude spaces, for instance, by testing
in each case    [^: ]   instead of    [^:]   ), but any line *not* found
by the above is either a group with no members or a badly formed line.

...Kris
-- 
Kristopher Stephens, | (408-746-6047) | krs at uts.amdahl.com | KC6DFS
Amdahl Corporation   |                |                    |
     [The opinions expressed above are mine, solely, and do not    ]
     [necessarily reflect the opinions or policies of Amdahl Corp. ]