A new encoding scheme

Fri Apr 1 18:55:11 AEST 1988

The below is the text of what we consider a revolutionary new concept
in computer hardware and software - the authors would appreciate any
comments on this proposal before attempting to `clean up' and buy out
IBM, DEC and Sun. Please make sure you understand all the concepts
before making any value judgements though. Thanks,

Julian.

---------------------------------------------------------------------

                 A New Data Encoding Scheme

            Andrew B. Cheese (abc at cs.nott.ac.uk)
            Julian P. Onions (jpo at cs.nott.ac.uk)

                Computer Science Department
                   Nottingham University
                        Nottingham.

                          ABSTRACT
                          --------

          The computer industry from its very inception
     has  used the binary system to represent all types
     of information. This system has much to  recommend
     it's use. It is simple to construct hardware based
     upon this system and all the normal arithmetic and
     logical operations can be performed upon it.

          However, it is the authors opinions that this
     system  is  not as optimal as it could be and that
     some rather simple changes to the method of encod-
     ing  data  can  bring  about  large  increases  in
     storage, communications and reliability.

1.  The Idea
-   --- ----

     The basic ideas stem from the seemingly simple idea  of
replacing  the  binary  method  of  encoding  by a different
scheme. In essence, all that is  required  is  to  take  the
value normally represented by a 1 or true state, and replace
this with two 0's or false states. e.g.

               decimal   binary    new scheme
               9         1 0 0 1   00 0  0 00
               13        1 1 0 1   00 00 0 00

At first sight there doesn't seem to be much of an advantage
in  this  scheme  of  encoding but as the rest of this paper
will attempt  to  prove,  the  benefits  are  enormous  when
applied in the proper way.

     This method, whilst not binary  is  also  not  strictly
unary.  It  has  therefore  been christened sesquinary (from
sesqui - one and a half).

                       April 1, 1988

                           - 2 -

2.  Applications
-   ------------

2.1.  File Storage.
- -   ---- -------

     An intelligent operating system can make great  use  of
the  encoding.  As  an example, a file need not be stored as
the complete set of bits. All that is required  is  for  the
operating  system  to keep count of the ``number'' of zero's
in the file. In the case of the this  would  mean  that  the
entire  disc would consist of inodes. Each inode, instead of
referencing blocks would keep  a  count  of  the  number  of
zeros.  For large files, double and triple indirection could
be applied - see the section  below  on  compression.  Obvi-
ously,  for  small  files,  the  single  indirection is more
cost-effective but with larger files it would  pay  to  move
towards more indirection as a saving of space. A flag in the
inode  could  keep  count  of  the  number  of  indirections
currently performed.

     This scheme does have some overhead in the updating  of
random access files, in that the operating system must first
``unpack'' the file, perform the update, and then repack the
file. This could probably be done in virtual memory for most
operations though.

2.2.  Networking.
- -   ----------

     In networking, this method really comes into it's  own.
To  begin  with,  there are practically no bandwidth limita-
tions. The problems inherent in  normal  communication  over
serial  and phone lines stems from the ability to detect the
transitions between two  states.  Once  this  transition  is
removed,  and  the  data  is  in  effect transitionless, the
bandwidth of the circuit is only reliant on the  speed  with
which zeros can be pumped down the line by the hardware (and
the rate at which they can be received of course).

     Another  advantage  comes  in  the  standard   ethernet
environment.  Normally an ethernet transceiver must wait for
a clear slot to arrive, transmit the packet and detect if  a
collision  occurred, if so it must retry.  With the all zero
encoding method transmission can take place  at  any  point,
there is in effect nothing on the ethernet that can scramble
the signal as all hosts are transmitting zeros.

2.3.  Compression
- -   -----------

     As hinted at above the  possibilities  for  compression
are  fantastic.  You can forget Huffman encoding and Lempel-
Ziv can take a walk!  The compression techniques can  reduce
any  amount of data to 1 number, although that number may be
larger than the convenient word size of  a  given  architec-
ture. The basic algorithm is outlined below.

                       April 1, 1988

                           - 3 -

        while (length(data) > 1) {
            data = count zeros(data);
                        -
            iteration ++;
        }
        return iteration;

This can be also be changed to do essentially the above  but
in N steps for large files.

2.4.  Hardware
- -   --------

     It is expected that there may  be  some  implementation
problems  associated  with the hardware of this device. How-
ever, the benefits appear to outweigh the drawbacks in  many
ways.  To begin with, the memory using this technique should
be simple. There is no need to invert bits or to even  sense
the  bits  -  they should all be zero anyway. Memory failure
can be detected very easily, no need for complex CRC  checks
- any 1 bits are obviously due to failing memory.

     Another advantage is that  all  memory  is  effectively
permanent, as there is no state to be saved. This means com-
puters built  using  this  model  should  be  unaffected  by
power-outs and be impervious to crashes.

2.5.  Encryption
- -   ----------

     This scheme also seems to lend itself to  data  encryp-
tion.  The  details  have  not been fully worked out and may
appear in a second paper once the decryption algorithms have
been straightened out.

2.6.  Parallelism and Data-flow
- -   ----------- --- ---- ----

     Again, this method has  more  advantages  for  parallel
hardware.  Shared  memory  is particularly easy to implement
for the same reasons that the ethernet is easy - effectively
there  are  no  changes  in memory state so collisions can't
happen (unless defective memory is present).

3.  Implementation
-   --------------

     There have been some doubts raised about  the  hardware
realisation of this technique, but in general this can prob-
ably be attributed either to the resistance to  change  gen-
erally  found,  or  by  manufacturers  protecting  their own
interests. The vast benefits that this method seems to  have
though should mean that once it is taken up it will clean up
in the computer industry.

                       April 1, 1988

-- 
Julian Onions