string compare with wildcards

Lars Henrik Mathiesen thorinn at rimfaxe.diku.dk
Wed Dec 19 09:35:13 AEST 1990


staff at cadlab.sublink.ORG (Alex Martelli) writes:
>claus at iesd.auc.dk (Claus S. Jensen) writes:
>>   Does anybody out there have a routine that compares
>>two strings, one of which contains wildcards (*,?).

>It's real easy, thanks to recursion (no "stylistic" flames, please!-):

>int wildmatch(pat, str) char *pat, *str; {
>	. . .

>Optimizing this, and in particular eliminating the recursive call,
>is, on the other hand, a rather interesting exercise, but one I'd
>undertake only if wildcard-matching was proven-by-profiling to be
>a bottleneck in my application... which is rather unlikely, I think.

There's nothing wrong with recursive calls. The problem with most
implementations of this is nested _loops_, and it doesn't matter if
they occur in recursive instances of a function, or are simulated with
a stack. Try "echo ********.c" in Bourne shell for a demonstration. X
users may have noticed the problem using font specifications with too
many asterisks.

Eliminating these nested loops changes the worst-case time complexity
from O(n^(m-1)) to O(n*m), where n and m are the lengths of the string
and the pattern, respectively.

The point is that once the part of a pattern between two asterisks
matches at some position in the string, there is no need to retry it
at any later position: The rest of the pattern (after the second
asterisk) will already have been tried (and have failed) at every
position after the first match, so it will only fail again.

Implementation hint: Let wildmatch return one of SUCCESS, RETRY or
FAIL. FAIL is returned when the string is exhausted before the
pattern, RETRY when it otherwise doesn't match the pattern. Both
SUCCESS and FAIL from the recursive call are returned to the caller
immediately, and the loop runs only on RETRY. (Once the loop is
entered, the function will only return SUCCESS or FAIL.) Once a
recursive version is finished, you can easily construct an iterative
one (Hint: you only need to save one (pattern,string) position pair).

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn at diku.dk



More information about the Comp.lang.c mailing list