Byte order (retitled)

Spencer W. Thomas thomas at utah-gr.UUCP
Thu Apr 10 00:23:10 AEST 1986


The best (and most tongue in cheek) discussion I have ever seen about
this issue is a USC/ISI memo written by Danny Cohen, titled "On Holy
Wars and a Plea for Peace".  The whole thing is about 33000 bytes, so I
won't post it, but I will give some excerpts.  [Well, even my excerpts
come to about 1/3 of the original.]

IEN 137                                                      Danny Cohen
                                                             U S C/I S I
                                                            1 April 1980


                   ON HOLY WARS AND A PLEA FOR PEACE



                              INTRODUCTION


This  is  an  attempt to stop a war.  I hope it is not too late and that
somehow, magically perhaps, peace will prevail again.

The latecomers into the arena believe that the issue is:  "What  is  the
proper byte order in messages?".

The root of the conflict lies much deeper than that.  It is the question
of  which  bit  should  travel first, the bit from the little end of the
word, or the bit from the big end of the  word?  The  followers  of  the
former  approach are called the Little-Endians, and the followers of the
latter are called the Big-Endians.  The details of the holy war  between
the  Little-Endians  and  the  Big-Endians  are  documented  in  [6] and
described, in brief, in the Appendix. I recommend that you  read  it  at
this point.
...
In a consistent order, the bit-order, the  byte-order,  the  word-order,
the  page-order, and all the other higher level orders are all the same.
Hence, when considering a serial bit-stream, along a communication  line
for example, the "chunk" size which the originator of that stream has in
mind is not important.

There  are  two  possible  consistent  orders.  One is starting with the
narrow end of each  word  (aka  "LSB")  as  the  Little-Endians  do,  or
starting with the wide end (aka "MSB") as their rivals, the Big-Endians,
do.

In  this note we usually use the following sample numbers: a "word" is a
32-bit quantity and is designated by a "W", and a  "byte"  is  an  8-bit
quantity  which  is  designated  by  a  "C"  (for "Character", not to be
confused with "B" for "Bit)".




                              MEMORY ORDER

The first  word  in  memory  is  designated  as  W0,  by  both  regimes.
Unfortunately, the harmony goes no further.

The Little-Endians assign B0 to the LSB of the words and B31 is the MSB.
The Big-Endians do just the opposite, B0 is the MSB and B31 is the LSB.

By  the  way,  if  mathematicians had their way, every sequence would be
numbered from ZERO up, not from ONE, as is traditionally done.   If  so,
the first item would be called the "zeroth"....

Since  most  computers  are not built by mathematicians, it is no wonder
that some computers designate  bits  from  B1  to  B32,  in  either  the
Little-Endians'  or the Big-Endians' order.  These people probably would
like to number their words from W1 up, just to be consistent.

...
On  the  other  hand,  the  Little-Endians  have  their  view,  which is
different but also self-consistent.

They believe that one should start with the narrow end  of  every  word,
and  that  low  addresses  are  of  lower  order  than  high  addresses.
Therefore they put their words on paper  as  if  they  were  written  in
Hebrew, like this:

                   ...|---word2---|---word1---|---word0---|

When they add the bit order and the byte order they get:

                   ...|---word2---|---word1---|---word0---|
                  ....|C3,C2,C1,C0|C3,C2,C1,C0|C3,C2,C1,C0|
                 .....|B31......B0|B31......B0|B31......B0|

In  this regime, when word W(n) is shifted right, its LSB moves into the
MSB of word W(n-1).

English  text  strings  are  stored  in  the  same order, with the first
character in C0 of W0, the next in C1 of W0, and so on.

This order is very consistent with itself, with the Hebrew language, and
(more importantly) with mathematics, because significance increases with
increasing item numbers (address).

It has the disadvantage that English  character  streams  appear  to  be
written backwards; this is only an aesthetic problem but, admittedly, it
looks funny, especially to speakers of English.

In  order  to  avoid  receiving  strange  comments about this orders the
Little-Endians pretend that they are Chinese, and write the  bytes,  not
right-to-left but top-to-bottom, like:

                        C0: "J"
                        C1: "O"
                        C2: "H"
                        C3: "N"
                        ..etc..

... (Discussion of PDP-11 Floating point unit leads into ...)

However, due to some oversights in the security screening  process,  the
Blefuscuians  took  over,  again.  They assigned, as they always do, the
wide end to the LOWer addresses in memory, and the narrow to the  HIGHer
addresses.

Let   "xy"   and  "abcd"  be  32-  and  64-bit  floating-point  numbers,
respectively.  Let's look how these numbers are stored in memory:

          ddddddddL ccccccccc bbbbbbbbb SMaaaaaaa yyyyyyyyL SMxxxxxxx
     ....|--word5--|--word4--|--word3--|--word2--|--word1--|--word0--|
    .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
   ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

Well, Blefuscu scores many points for this. The above reference  in  [3]
does not even try to camouflage it by any Chinese notation.

Encouraged by this success, as minor as it is, the Blefuscuians tried to
pull  another fast one.  This time it was on the VAX, the sacred machine
which all the Little-Endians worship.

Let's look at the VAX order. Again, we look at the way  the  above  data
(with xy being a 32-bit integer) is stored in memory:

               "N" "H"   "O" "J"  SMzzzzzzL SMxxxxxxx yyyyyyyyL
          ...ng2-------|-------long1-------|-------long0-------|
         ....|--word4--|--word3--|--word2--|--word1--|--word0--|
        .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
       ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

What a beautifully consistent Little-Endians' order this is !!!

So,  what  about  the infiltrators? Did they completely fail in carrying
out their mission?  Since the integer  arithmetic  was  closely  guarded
they  attacked  the  floating  point  and the double-floating which were
already known to be easy prey.

Let's  look, again, at the way the above data is stored, except that now
the 32-bit quantity xy is a floating point  number:  now  this  data  is
organized in memory in the following Blefuscuian way:

               "N" "H"   "O" "J"  SMzzzzzzL yyyyyyyyL SMxxxxxxx
          ...ng2-------|-------long1-------|-------long0-------|
         ....|--word4--|--word3--|--word2--|--word1--|--word0--|
        .....|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|-C1-|-C0-|
       ......|B15....B0|B15....B0|B15....B0|B15....B0|B15....B0|

Blefuscu  scores  again.    The  VAX  is  found guilty, however with the
explanation that it tries to be compatible with the PDP11.

Having found themselves there, the  VAXians  found  a  way  around  this
unaesthetic   appearance:  the  VAX  literature  (e.g.,  p. 10  of  [4])
describes this order by using the Chinese top-to-bottom notation, rather
than an embarrassing left-to-right or right-to-left one.  This page is a
marvel.  One has to admire the skillful way in which some quantities are
shown in columns 8-bit wide, some in 16 and other in 32, all in order to
avoid the egg-on-the-face problem.....

By the way, some engineering-type people complain  about  the  "Chinese"
(vertical)  notation  because usually the top (aka "up") of the diagrams
corresponds to "low"-memory (low addresses).  However,  anyone  who  was
brought  up by computer scientists, rather than by botanists, knows that
trees grow downward, having their roots at the top of the page and their
leaves down below. Computer scientists seldom remember  which  way  "up"
really is (see 2.3 of [5], pp. 305-309).

...
                 SUMMARY (of the Memory-Order section)


To  the best of my knowledge only the Big-Endians of Blefuscu have built
systems with a consistent order  which  works  across  chunk-boundaries,
registers,   instructions   and   memories.      I   failed  to  find  a
Little-Endians' system which is totally consistent.

...  (Discussion in similar vein of various transmission protocols)

              SUMMARY (of the Transmission-Order section)


There  are two camps each with its own language.  These languages are as
compatible with each other as any Semitic and Latin languages are.

All Big-Endians can talk to each other with relative ease.

So can all the Little-Endians, even though there  are  some  differences
among the dialects used by different tribes.

There is no middle ground. Only one end can go first.




                               CONCLUSION


Each  camp  tries  to convert the other.  Like all the religious wars of
the past, logic is not the decisive tool. Power is.  This  holy  war  is
not the first one, and probably will not be the last one either.

The  "Be reasonable, do it my way" approach does not work.  Neither does
the Esperanto approach of "let's all switch to yet a new language".

Our communication world may split according to the  language  used.    A
certain  book  (which  is  NOT  mentioned in the references list) has an
interesting story about a similar phenomenon, the Tower of Babel.

Little-Endians are Little-Endians and Big-Endians  are  Big-Endians  and
never the twain shall meet.

We  would like to see some Gulliver standing up between the two islands,
forcing a unified communication regime on all of us.  I do hope that  my
way  will  be chosen, but I believe that, after all, which way is chosen
does not make too much difference.  It is more important to  agree  upon
an order than which order is agreed upon.

How about tossing a coin ???
...


    For ease of reference please note that Lilliput  and  Little-Endians
    both start with an "L", and that both Blefuscu and Big-Endians start
    with a "B".  This is handy while reading this note.

                          R E F E R E N C E S



[1]   Bolt Beranek & Newman.
      Report No. 1822: Interface Message Processor.
      Technical Report, BB&N, May, 1978.

[2]   CCITT.
      Orange Book. Volume VIII.2:  Public Data Networks.
      International Telecommunication Union, Geneva, 1977.

[3]   DEC.
      PDP11 04/05/10/35/40/45 processor handbook.
      Digital Equipment Corp., 1975.

[4]   DEC.
      VAX11 - Architecture Handbook.
      Digital Equipment Corp., 1979.

[5]   Knuth, D. E.
      The Art of Computer Programming. Volume I:  Fundamental
         Algorithms.
      Addison-Wesley, 1968.

[6]   Swift, Jonathan.
      Gulliver's Travel.
      Unknown publisher, 1726.


               OTHER SLIGHTLY RELATED TOPICS (IF AT ALL)

Who's on first?   Zero or One ??

People  start  counting  from  the  number  ONE.  The very word FIRST is
abbreviated into the symbol "1st" which indicates ONE,  but  this  is  a
very modern notation.  The older notions do not necessarily support this
relationship.

In  English  and  French - the word "first" is not derived from the word
"one" but from an  old  word  for  "prince"  (which  means  "foremost").
Similarly,  the  English  word  "second"  is not derived from the number
"two" but from an old word which means "to follow".  Obviously there  is
an  close  relation between "third" and "three", "fourth" and "four" and
so on.

Similarly, in Hebrew, for example, the word "first" is derived from  the
word  "head",  meaning  "the foremost", but not specifically No. 1.  The
Hebrew word for "second" is specifically derived from  the  word  "two".
The same for three, four and all the other numbers.

...

SWIFT's POINT

It may be interesting to notice that  the  point  which  Jonathan  Swift
tried  to  convey  in  Gulliver's Travels in exactly the opposite of the
point of this note.

Swift's point is that the difference between breaking  the  egg  at  the
little-end  and  breaking  it  at the big-end is trivial.  Therefore, he
suggests, that everyone does it in his own preferred way.

We agree that the difference between sending eggs with  the  little-  or
the  big-end first is trivial, but we insist that everyone must do it in
the same way, to avoid anarchy.  Since the difference is trivial we  may
choose either way, but a decision must be made.


-- 
=Spencer   ({ihnp4,decvax}!utah-cs!thomas, thomas at utah-cs.ARPA)



More information about the Comp.lang.c mailing list