Diatribe on uninitialized externs

Kevin Martin kpmartin at watmath.UUCP
Thu Oct 25 09:51:50 AEST 1984


The following article refers to the entire C environment, including the
compiler, the linker, and the operating system.

There seem to be four alternatives for what to do with externs and statics
which are not explicitly initialized:

1) Have their value be undefined (i.e. garbage).
   Disadvantages:
      Breaks many current programs. It could be argued that well-written
            programs (as opposed to 'correct' programs) would not be broken,
            since a well-written program initializes variables explicitly
            if it cares about the initial value.
      Arrays of unknown size become effectively impossible to
            initialize (at all)(see note 1)
   Advantages:
      Consistent behaviour with autos and malloc'ed space
      Consistent with normal reason (i.e. the variable contains
            a predictable value ONLY IF it has been initialized in the
            C source).
      Tends to encourage easy-to-read code: the reader can tell (or
            *should* be able to tell, if coded cleanly) if there
            is initialization *code* somewhere. e.g. you are sure that in
               int x;
               int y = 5;
            there is initialization code (somewhere) for 'x' but not for 'y'.
      Makes object and a.out files smaller, thus program load time is
            also reduced (note 2)(note 4).
      Allows the programmer to get genuine "bss" (un-initialized) space.
            This becomes especially important if overlays are being used,
            since it may be desired that an overlay be loaded without re-
            initializing all the variables it contains (note 4).

2) Have their value be the 0 bit pattern.
   Disadvantages:
      Programs which don't explicitly initialize their pointers and
            floats would not port to any more machines than they currently
            do (note 3)
      Arrays of unknown size containing floats, doubles or pointers
            cannot be initialized (note 1).
   Advantages:
      This is the current method (i.e. inertia reigns)
      Makes object and a.out files smaller, thus program load time is
            also reduced (note 4).

3) Have their value set to a zero of the appropriate type.
   Disadvantages:
      Requires a somewhat arbitrary rule on "what is the appropriate type
            for a union?"
      Generates larger object files, etc (note 4).
      The programmer cannot signal to the reader that a variable is
            deliberately being left un-initialized.
      Arrays of unknown size cannot be initialized if they contain
            non-zero values.
   Advantages:
      Allows old code to be ported to new machines (note 3).

4) A combination of (1) and (2):  Un-initialized variables start off as
   zero in the first overlay that is loaded. Subsequent overlays get whatever
   was left in the storage location by previous overlays.
   Disadvantages:
     Same as for (1), except that existing programs are not broken.
   Advantages:
      Same as (1), except that sloppy coding has a better chance of
            running.

Note 1:
   By "array of unknown size", I mean, for example, and array whose size
is a #define'd constant. There is currently no method of giving explicit
initializers to such an array in its entirety, unless the source file is
heavily modified each time the #define'd constant is changed.
   Note that the improved CPP facilities (#eval and genuine macros) which
I described in an earlier article would allow such arrays to be initlalized
to *any* value (not just zero bit pattern or zero of the appropriate type),
thus making the variations on this disadvantage go poof.

Note 2:
   Since most systems clear the memory before a program is loaded, for
security purposes, method (1) often flukes out to be method (2).

Note 3:
   If the purpose of the standard does not include porting existing (old)
programs to new C implementations on "hostile" hardware, this advantage/
disadvantage does not exist. I believe that it is the case that the new
standard should allow NEW programs to be written portably, and that old
programs continue to work, but *only on machines on which they already work*.

Note 4:
   These features (reduced object or a.out size, and overlays) may or may
not exist on any particular system, and they may be non-issues to many
users (because they have lots of disk space, or they think overlays are for
the birds). However, these features *do* exist on some systems, and the
users *do* find them useful, and it would be desireable that the standard
*not* be written such that a compiler has to be non-conforming to take
advantage of such features.


If overlays are going to be ignored, (2) and (4) are equivalent.

Ignoring the problems of upward compatibility and lazy programming
styles, choice (1) is the winner. However, given that old
programs must continue to work, Choice (4) looks like the best one.

The only bad problem with (4) is that of array initialization. As mentioned
above, this can be solved much more generally with an improved CPP.
This standard will probably not include such features, or a method of choosing
which union member to initialize. But there will be more C standards down
the road, and these features may appear, making (1), (2) or (4) the clear
winning choices.
If the committee goes for choice (3) now, this will only encourage code
which doesn't explicitly initialize things, and make for an even larger
base of software to break when the next standard tries to go back to
choice (1) or (2).

I consider (4) with improved CPP to be the long-range goal, and the
implementation of (3) in the current standard prevents changing to (4)
in the next standard.
We can either let it sit as is for now, and fix it properly when the
facilities become available, or we can (for the feeble reason of
porting old shit code to new machines) paint ourselves into yet another
corner by fixing it poorly immediately.
                       Kevin Martin, UofW Software Development Group



More information about the Comp.lang.c mailing list