Out of range pointers

Paul S. R. Chisholm psrc at poseidon.UUCP
Tue Sep 27 14:18:59 AEST 1988


< "NO toon can resist the old shave-and-a-haircut gag!" >

In article <8557 at smoke.ARPA>, gwyn at smoke.ARPA (Doug Gwyn ) writes:
> In article <33547 at XAIT.XEROX.COM> g-rh at XAIT.Xerox.COM (Richard Harter) writes:
> >However it would be very nice if there were a library routine that would
> >tell you whether a pointer was legal or not.
> I think if your code has to worry about this, it is ALREADY in trouble.
> Pointers should come in two flavors: null (which is easy to test) and
> valid (which you should be able to assume when non-null).

Well, maybe your code *is* in trouble.  Maybe you're being called by
some other slob's function, and he* can't tell '\0' from NULL.  Or
maybe you've got mysterious core dumps, and would like to at least
printf( "Goodbye, cruel world!\n" ) before you exit() off that mortal
coil.  (Or, who knows, even tell your MS-DOS tester where the software
was right before the PC suddenly froz

Anyway, you do have a few sources of data on your data.  Note that
*all* of these are compiler and operating system dependent to
*implement*, but once implemented, could be used in a fairly portable
function.

Everything's either global/static, automatic, or malloc'ed, right?
You may be able (by staring at the output of nm, or at the .MAP files
your linker generates) to find a relationship between some names that
often (always?) show up.  Is one symbol always the first or last in
initialized (global) memory?  Then you know one limit of the address
range of extern's and static's.

The symbols end, etext, and edata go *way* back in the history of the
UNIX(R) operating system.  "The address of etext is the first address
above the program text [instructions; that is, a limit of the range of
function pointers], edata above the initialized data region [extern's
and static's], and end above the uninitialized data region. . . .  the
current value of the program break [initially & end] should be
determined by 'sbrk(0)' (see brk(2))."  [end(3C), from an old UNIX
system manual.]  (This one looks UNIX-system specific, I'm afraid.)

It's very common for at least two of these areas to share a common
boundary.  (For example, the stack begins just above the instructions.)
So one number gives you two boundaries.

A final trick is to put a magic auto variable in main(), such that
it's the very first object on the stack.  (This may be the lexically
first or last variable in main()), and store its address in an extern
for later checking.  A checking function can define a local variable of
its own, if only to measure the extent of the stack.

Between this flood of numbers, and some system-specific experimentation
to see how they work together, we could produce the following checking
functions:

valid_fpointer():  Is the argument conceivably a valid function
pointer?  (In this case, make sure it's on a valid boundary, too.)

valid_extern():  Is the argument possibly a valid pointer to an
extern or static object?

valid_auto():  Is the argument in the right range to be the address of
a local variable of some active function?

valid_alloc():  Is the argument a value malloc() or one of its cousins
has returned?  (There are all sorts of ways of beefing up this one.)

valid_heap():  More lenient than valid_alloc(), is this possibly the
address of an object, or part of an object, allocated off of the heap
by the malloc() family?

#define valid_data( p ) \
	( valid_extern( p ) || valid_auto( p ) || valid_heap( p ) )

Paul S. R. Chisholm, psrc at poseidon.att.com (formerly psc at lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.
UNIX(R) is a registered trademark of AT&T
(*"he":  No female programmer would ever do that!-)



More information about the Comp.lang.c mailing list