Byte order (retitled)

Richard Harter g-rh at cca.UUCP
Thu Apr 10 17:48:18 AEST 1986


In article <> ggs at ulysses.UUCP (Griff Smith) writes:
>...
>> 	Well, no, little-endian came about because the engineers at DEC
>> who designed the PDP-11 made an arbitrary decision that was not well
>> thought out.  I will not essay to defend the sanity of DEC engineers,
>> and cannot recommend that any one else do so (:-)).  It was a bad
>> decision.
>.
>...
>> In short, little-endian was a mistake, is a mistake, and will continue
>> to be a mistake.
>> 
>> 		Richard Harter, SMDS Inc.
>
>As an old PDP-11 hacker, I can't agree with the condemnation of the
>DEC engineering decision.  You are looking at it from the perspective
>of a modern software engineer who wouldn't think of type punning and
>other hacks.  To an assembly language programmer, however, the ability
>to use the same address to test the low and high bytes of a device
>status register meant that code would be shorter and faster.  It also
>increased the number of cases where indirect addressing could be used
>with register pointers.  You can't expect the engineers to have
>anticipated that high-level languages would discredit these practices.

	Ah, but I too am an old PDP-11 hacker.  (In fact, my first
	DEC machine was a PDP-1!)  I've done all those good things
	you talk about -- however you could do exactly the same
	things in a correctly designed big-endian machine.  The
	issues at hand have nothing to do with modern software
	engineering and high-level languages.  See below.

>
>My own theory about big vs. little end usage is that the mistake was
>made hundreds of years ago when merchants started to adopt the Arabic
>(as adapted from earlier Hindu sources) number system.  Note that
>Arabic is written right to left; note that numbers are written right
>to left.  I think the Arabs knew what they were doing; they set the
>notation so that the natural computational order followed the
>conventional lexical order.  The European merchants missed the point
>and copied the notation verbatum instead of compensating for the
>opposite lexical convention.
>
>In summary, big-endian was a mistake, but there is no use fighting it.
>Any better-informed historical challenges will be cheerfully accepted;
>the best data I could get was from an ex-patriot of Iran.

	Being that we (You and I and a select few others) are all reasonable
beings let us all eschew slogans and epitaths (especially me) and reason
together.  Perhaps we can find some truth.  Let us see.

	Discussions of little-endian vs big-endian are often muddied by
two collateral issues, the merits of coherent addressing, and the merits
of byte addressing.  Coherent addressing is a neologism I have invented
for this discussion.  All it really means is that all addressing goes in
the same direction.  Thus bit 0 of byte 0 of a word is bit 0 of the word,
etc.  The diagram below illustrates coherent addressing:

Byte:	0 1 2 3 4 5 6 7 8 ....
Int*2:	0   1   2   3   4 ....
Int*4:	0       1       2 ....

Coherent addressing is clearly desirable.  However coherent addressing
has nothing to do, per se, with little-endian or big-endian.  A common
point of confusion in these arguments is to argue for the advantages
of coherent addressing in the belief that one is thereby arguing for
ones favorite position.

	The PDP-11 uses uniform byte addressing.  The advantage of
uniform byte addressing is that the addresses are independent of the
size of the entity being addressed.  The disadvantage of uniform byte
addressing is that it consumes address bits -- one for shorts, two for
longs, and more if we extend the scheme to larger blocks.  This is not
critical in present architectures; it would be if we dropped to the
level of uniform bit addressing.  Again, the merits of little-endian
versus big-endian have nothing to do with the merits of uniform byte
addressing.

	Big-endian versus little-endian only arises when we decide
which bit (byte) of a word is byte 0 -- the most signifigant byte or
the least signifigant byte.  Either choice will do for coherent
addressing.  However the choice does affect two areas, arithmetic
and comparison.  Let us consider the problems of doing arithmetic
on integers of indefinite size.  In that case the natural method
is to represent integers as polynomials in powers of two and do
the arithmetic starting with the lsb and work up (the point is
that the algorithms do not depend on knowing the location of the
msb's of the operands.)  In short, little-endian is the correct
choice for doing arithmetic on integers of indefinite size.

	For comparisons of strings of indefinite size, on the
other hand, the correct choice is big-endian.  The key point is
that the natural method for comparison is to first compare msb's
and work down.  It turns out, if one thinks about it, that the
natural model for strings is the binary fraction rather than
binary polynomial.

	The advantages of little-endian probably show up in
hardware design at the microcode level even though the machine
instructions for arithmetic operate on fixed size operands.  If
this is the case then this might have been a design factor when
the PDP-11 was first being designed.  (Cheap small computers
were a LOT slower in those days, and every trick to gain speed
and simplicity at the hardware level counted.)  At the machine
code level, however, all arithmetic instructions are fixed length
so little-endian/big-endian is, again, irrelevant.  In general
programming, indefinite length arithmetic is a rare animal.

	Comparison of strings, either bits or characters, is
ubiquitous.  And it is here that big-endian is preferable.
In view of this fact I conclude again, little-endian was a
mistake, albeit less of one than I made it out to be.

	Some side issues:  We write left to right because most
people are right handed.  Script style matters here -- if we
are drawing characters the direction doesn't matter.  In
choosing the representation of numbers the issues are the same
as they are for byte ordering.   If size is the issue, big-endian
has the advantage; if arithmetic is the issue, little-endian has
the advantage.

	As a personal note, when I was young I drilled myself
on fast multiplication.  I could write down two numbers and
then write down their product underneath at hand writing speed.
When I was in practice I could do products of five digit numbers
readily.  The trick isn't hard -- you just do cross multiplication.
However you have to do it little-endian style, i.e. work from the
low end up.  

		Richard Harter, SMDS Inc.



More information about the Comp.lang.c mailing list