POSIX Regular Expression Funnyness

Geoff Clare gwc at root.co.uk
Wed Feb 1 21:02:22 AEST 1989


In article <4118f7b1.ae48 at apollo.COM> arnold at apollo.COM (Ken Arnold) writes:
>The POSIX proposal [] has a rework of regular expressions.
>(stuff deleted)
>
>They have added a new set of bracket expressions which stand for
>pre-defined sets of characters.  For example, "[:alpha:]" is all
>alphabetic characters, "[.ch.]" is the character string ch treated as a
>single character (which is useful for sorting in many languages), and
>"[=a=]" refers to all variants of a, i.e., a, a with a circumflex, a
>with an umlaut, etc.
>
>(stuff deleted)... these new bracket expressions only have their new
>meaning inside outer brackets.
>
>Why?  The only existing expressions you would break if you allowed "top
>level" [::] expressions (or [..] or [==] expressions) would be
>expressions which currently existed that contained *two* colons (or
>dots or equals), on either side.  Since this is currently pointless
>redundancy, I can't believe this is a serious problem.

There are more serious problems with the new expressions than just the
obscure syntax.  A short while ago I had to design some verification
tests for these new regular expressions as part of the X/Open verification
suite (the latest X/Open standard incorporates POSIX).  I found some
ambiguity in the area of 2 to 1 character mappings.  For example, if ch
collates between c and d, which of the following REs should match the
string "xchy"?

	x[a-[.ch.]]y
	x[a-[.ch.]]hy

The simple answer would be to create some rule about 2 to 1 character
mappings to eliminate the ambiguity.  However, whichever rule is
decided, there will be many cases where the actual behaviour is
non-intuitive, resulting in users not getting the results they expect.

We have informed X/Open of the problem, and are waiting to see what they
come up with.

Geoff.
-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc at root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-606-7799  FAX: +44-1-726-2750



More information about the Comp.unix.wizards mailing list