isalpha in ctype.h

Lars Wirzenius wirzenius at cc.helsinki.fi
Thu Mar 21 10:42:07 AEST 1991


In article <1991Mar20.112543.5515 at ericsson.se>, etxnisj at eos8c21.ericsson.se (Niklas Sjovall) writes:
> #define	_U	01
> #define	_L	02
> extern	char	_ctype_[];
> #define	isalpha(c)	((_ctype_+1)[c]&(_U|_L))
> 
> It's the part (_ctype_+1)[c] i don't understand. Could there be any
> segmentation errors using this?

Since isalpha is a library function (and a common one at that), there
shouldn't be any errors if you use it correctly, i.e. only give it valid
arguments. In this case, the arguments have to be valid characters or
the value of EOF (as defined in <stdio.h>).

The way this (seems to be) implemented by Sun is: _ctype_ is an array,
which is subscripted with the character argument (henceforth referred to
as c), and each element of the array is a collection of flags that
identify various characteristics of the character, such as whether it is
a letter or not. 

As long as you only need to test real characters, you can simply use
_ctype_[c].  However, isalpha should handle the value of EOF also.  We
could first test whether c == EOF, and use _ctype_ only if it isn't, but
that requires using c twice, which isn't good, because of possible side
effects (isalpha(getchar()) is quite reasonable sometimes). 

What we do instead is define EOF as -1 (we can do that, since we're
writing the whole library), and arrange so that EOF's flags come at the
beginning of the array (_ctype_[0]), then the real characters' flags,
each at an index one greater than the numeric value of the character.
This means that we can write _ctype_[c+1] to access the flags for
character c; EOF is -1 so its flags come at _ctype_[-1+1], i.e.
_ctype_[0]. 

Another way to write the expression is to use pointer arithmetic.  This
is what Sun has done.  The value of the name of an array, _ctype_,
becomes in value contexts a pointer to the first element of the array,
&_ctype_[0].  If we add 1 to this pointer, we get a pointer to the next
element, _ctype_[1].  This pointer is then subscripted with the
character argument, since now the flags for character c are at offset c. 
The flags for EOF are at index -1, which in this case is a valid index,
since it is still inside the real array, _ctype_.  However, subscripting
_ctype_ with -1 (i.e.  _ctype[-1]) is quite illegal, and can very well
result in a segmentation error; the same happens if you call
isalpha(-2).  Exactly what happens depends on the system, I believe
'undefined behaviour' is the phrase used in the ANSI standard for C
(there have been many nice suggestions for this behaviour, ranging from
mailing a complaint to Dennis Ritchie, to launching a nuclear attack;
segmentation errors and system crashes are more normal ones (I hope
:-)). 

-- 
Lars Wirzenius    wirzenius at cc.helsinki.fi



More information about the Comp.lang.c mailing list