Boundary alignment: Ken Reek responds (LONG!)

Tue May 8 01:57:55 AEST 1984

<>

	Having collected all of the replies to my posting, it is now time for
a reply.  First, a disclaimer -- I am not advocating that reduced-instruction-
set machines themselves go away.  Whether the concept is good or not is a
completely different debate.  It is the pyramid's boundary alignment require-
ment that I object to.

	I'll answer arguments in more or less the order in which I received
the articles here.

Henry Spencer (Univ of Toronto Zoology) writes, in part:

> Simple hardware wins on more counts than just making life easier for
> lazy engineers.  It is simpler and cheaper to build, ..., more reliable,
> easier to fix when it breaks, etc etc.

Yes, simple hardware is easier to design, easier to fix, and all of the rest.
But it is a lot less useful.  How simple should it be?  An abacus is pretty
simple, for example.

> (omitted from above) simpler for the software to run (comparing an 11/44 to
> the nightmare complexity of the VAX i/o structure, for example, tells you
> why 4.2BSD is so bloated)

You are confusing implementation with interface.  The functions you choose to
implement do not determine the interface between the hardware and the soft-
ware; consider the many different ways that different machines support I/O.
On the Vax, it is the complexity of the interface that causes the problems,
not underlying implementation.

> Don't forget that magic word "cheaper".  It has become fashionable
> to say "software costs totally dominate hardware costs", but most
> people forget to add "...unless you can't afford the hardware in the
> first place".  Hardware and software money don't usually come out of
> the same pot, and the people who make decisions about such things are
> not necessarily as enlightened as we are.

Then enlighten them!  If you buy a machine on which software development is
difficult only because that machine is cheaper, you're not making a very
good decision.  It's up to those who know about such things to educate those
that "make the decisions" about the false economy of choosing a machine based
only on price.

> And once again, don't forget the example of the VAX:  sure, it looks
> like a nice machine, but it's grossly overpriced for its market now.
> This is despite massive use of [semi-]custom ICs on the more recent
> VAXen -- and you would not believe what a disaster that is for
> production and maintenance!  (There is an awful lot to be said for
> using standard parts, which means restricting yourself to things that
> can be built economically with them.)

Are you suggesting that it would have been better to build Vaxen from
7400 series logic?  I think not.

> I have heard, from reliable sources, that if/when the successor to the VAX
> emerges, the biggest difference will be that it will be much simpler.

That's the interface again, not necessarily the implementation.

> If you can show me a way to eliminate alignment constraints without a
> speed penalty, WITHOUT adding large amounts of hardware (which I could
> use better to make the aligned version faster), I'd love to hear about
> it.  It's hard.

Page 200 of the Vax Hardware Handbook describes how it is done with a cache
on the 780.  The same can be (is!) done with data, using a larger cache to
compensate for the less sequential nature of data accesses.

> But actually, most of this is beside the [original] point.  We are not
> talking about some decision which makes life a lot harder for the poor
> software jockey.  We are talking about a decision which requires more
> memory to get equivalent performance.  There is a perfectly straight-
> forward hardware-vs-hardware tradeoff here:  is it cheaper to build
> a machine that doesn't care about alignment, or to just stack more
> memory on a machine that does care?  I would give long odds that the
> latter approach wins, even just on initial cost.  When you think about
> things like reliability and maintenance, it wins big.

Good point.  However, for programs that dynamically allocate space, the size
of the largest problem that can be handled is determined by how efficiently
you use that space.  For ANY given memory size, the program that more
efficiently uses the space can handle larger problems.

Of greater concern, though, is where all that data resides when the program
is not actually running.  Can you add additional disk space to hold all of
the wasted space in your data structures as cheaply as you can add main
memory?  I would be upset if I had to buy another drive because all of my
existing ones were full of data that was 25% wasted space.  Sure, you can
pack them on disk and unpack them when you read them, but you are then
trading away execution efficiency.

> I agree that this doesn't help the poor people who have made a big
> investment in data structures that assume no alignment constraints.
> These people have made a mistake, period:  they have imbedded a major
> machine-dependent assumption in software that obviously should have
> been portable.  

This is my whole point -- alignment should NOT be machine dependent.

This from Spencer W. Thomas:

> A current trend in computer design is to assume that the user will only
> be writing in a high-level language, and that the compiler will do the
> hard work of generating machine code.  This is the theory behind RISC
> machines, in particular.  Making the hardware simpler makes it run faster.

Note: RISC machines simplify the interface to the machine, the machine
language.  The point of this is to simplify the generation of optimal code.
The speed of the machine is determined by the implementation, not the inter-
face.

> Once we start getting really convoluted machines (such as ELI, or some
> pipelined machines which execute several instructions after a
> conditional branch, before actually branching), all your clever hacks
> based on assumptions about the hardware will just go straight down the
> tubes.  If the compiler were smart enough, it would say "Oh, he's trying to
> access a longword, but it's not on the right boundary", and generate a byte
> move instruction to align it before accessing.

Huh?  If the implementation is allowed to screw up the interface, then the
instructions won't be doing what you think they should (e.g. executing several
instructions after a conditional branch before actually branching).  To over-
come this, the compiler would have to be pretty smart.

As for automatically checking for whether a byte move is necessary, it's fine
for statically allocated structures.  For any structure accessed through a
pointer, however, a run-time check would be required.  Again, we're trading
hardware complexity with performance.  If you want blinding speed, do it in
hardware.

> The basic problem is that generality is slower.  For really blinding
> speed, you always have to give up something.  With big memories,
> arbitrary alignment is not too hard to give up.  (I bet that the
> original application never put longwords on odd byte boundaries, now did
> it?)

The original application DID have "longwords on odd byte boundaries" -- that's
what caused the whole discussion.  Given that and the discussion above, then
the fastest solution is to have the hardware (not the software) take care of
non-aligned data.

>From mprvaxa!tbray:

> His argument is that byte addressability is a win because of the
> greater ease in writing software and the high cost today of software
> vs hardware.

> Not so!  Because...

> 1. All significant machine code today is generated by compilers, not
>    by people, and the compilers do the messy work of aligning everything.

Only if you're willing to pay the price -- see the discussion of disk space
and maximum problem size above.

> 2. Removing the byte-addressability constraint allows the hardware boys
>    to build much more cost-effective architectures, and to build them
>    quicker.

It should be no surprise that less useful hardware is cheaper and faster to
build.

> 3. Point number 2 is vital since the line about the rising cost of 
>    software with respect to hardware is so much horse puckey.  All those
>    graphs of the future that showed a graph that looked like an X, the
>    rising line being software and the falling line hardware, never happened.

Let's look at the vax again.  Compare the cost of developing the hardware
to the cost of all of the software that runs on it, and see the complaint
above about how the complexity of the vax resulted in a "bloated" 4.2 Unix.

>    The reason being that the demand grows at a phenomenal rate and every
>    year software becomes more layered, more functional, and less hardware-
>    efficient (*see note).  Which is as it should be.  So quick, cheap 
>    architectures are IMPORTANT.

If you're talking about operating systems, then yes.  How many operating
systems do you know of, though, that provide application-specific function-
ality?  Until that happens, complex applications systems will remain expen-
sive to implement, especially PORTABLE ones.  Reducing needless differences
between machines will make this simpler.

> If somebody can build, say, a Ridge 32 and it runs a really nice UNIX (it
> doesn't yet) and goes like hell for < $50K (it does), I'll cheerfully
> grapple with nUxi and alignment problems in my existing software.

I wouldn't.

> As to the reduced machine efficiency of modern software, this was really
> brought home to me one time I was touring a DEC manufacturing plant, and
> there were these 6 huge backplane-wiring machines running flat out, all
> being driven by a little wee PDP-8 (!).  When I expressed surprise, the
> manager explained that the 8 could do it with room to spare because there
> was no operating system to get in the way...

So what?  This particular application didn't need any of the capabilities
provided by a modern operating system, such as multiple users, paging, device
independent I/O, networking, etc.

And finally, from hhb!bob:

> Now let me flame at the folkz who felt compelled to tell me that we had
> written the code completely wrong.  These responses we just typical
> (and as I had expected) of UN*X snob types with little understanding of
> what it takes to develop major software systems.  With attitudes like
> that we ought to just throw most of UN*X out the window.  Do you have
> any idea how much effort we spent making the UN*X utilities work on a
> machine that did not have character pointers the same size as all other
> pointers ?  (This was for the word addressed machine I had previously
> mentioned).  It was months, and an extremely tedious job.  So obviously
> they wrote UN*X wrong PERIOD.

Right on.  Remember that, prior to 4.2, block addresses in inodes were stored
on the disk as 3 bytes.  Why?  To save space on the disk, that's why!  Of
course Unix is not "wrong PERIOD" -- it is a REAL software product, the
result of compromise and continual change.

In conclusion -- the fewer differences there are between machines, the
easier it will be to port software.  I do not mean to imply that all 
machines should be identical; it still makes sense to have "small" machines
for small applications and large machines for large applications, or
machines with special quirks for quirky applications.  Within any given
group, however, non-essential differences should be eliminated from the
architectures!  Manufactures that do this will be rewarded with increased
sales IF the software engineers educate those who hold the purse strings
about the economies of producing software.

	Ken Reek, Rochester Institute of Technology
	{allegra,seismo}!rochester!ritcv!kar