Summary for Protection in Cray

Fri Jan 11 10:55:03 AEST 1991

> On 10 Jan 91 23:07:15 GMT,chiueh at sprite.Berkeley.EDU (Tzi-cker Chiueh) said:

chiueh> chiueh at sprite.Berkeley.EDU (Tzi-cker Chiueh) writes:
> So why does Cray get rid of virtual memory altogether ?  Or does anybody 
> know how much performance improvement can we gain from getting rid of VM 

chiueh> I suggest you see the IEEE proceedings from the Supercomputing
chiueh> conference that was held last month in New York.  Cray
chiueh> published an article in these proceedings that describes their
chiueh> memory architecture and gives clock timings for current and
chiueh> future memory architectures.

chiueh> Summary of what follows: - Memory speed is THE supercomputing
chiueh> bottleneck.  - Cray can fetch from memory in 17 cycles.
chiueh> Demand paging would lengthen this time significantly.  -
chiueh> Virtual memory trades speed for money.  Supercomputers do not
chiueh> compromize on speed.  - Cray Y-MP/8s have 4 gigabyte per
chiueh> second memory bandwidths.  - Supercomputing working sets and
chiueh> problems sizes tend to be equal.  - Demand paging would
chiueh> complicate an already very complicated instruction scheduler.

chiueh> Memory speed is THE bottleneck in supercomputing.  It is was
chiueh> makes Cray king of the hill.  The Japanese have faster peak
chiueh> CPU speeds, but their memory bandwidths are inferior.  This is
chiueh> a key reason why Cray machines are the fastest computers
chiueh> available for most production benchmarks (with notable
chiueh> exceptions.)

chiueh> The number of cycles needed to transfer the first word from
chiueh> memory to a register is one of the most critical timings in
chiueh> the supercomputer.  Cray can do this in 17 cycles.  An SX3
chiueh> requires 70 cycles.  An ETA 10 needed hundreds of cycles.
chiueh> Adding demand paging will significantly lengthen this cycle
chiueh> time.  If you can add demand paging without adding cycles to
chiueh> this memory fetch time, then I am sure Cray will make you a
chiueh> rich person.

chiueh> Supercomputers with virtual memories have been tried.  The CDC 205 and the 
chiueh> ETA10 are examples.  When these machines ran codes where the problem size 
chiueh> exceed the RAM size (paging), they ran 10 time slower than when paging did 
chiueh> not occur.  

chiueh> Virtual memory is a technique of trading time for money.  Virtual memory 
chiueh> costs less than real memory, but is slower.  Slower memory is not an 
chiueh> option for supercomputing.   Witness the success of Cray and the demise of 
chiueh> ETA.

chiueh> The Cray achieves two words read and one word written per clock per CPU.  
chiueh> On a Y-MP/8 this is a memory bandwidth of 4 gigabytes per second.  Disks 
chiueh> bandwidths are not adequate to keep up with this type of demand.

chiueh> The theory of virtual memory depends on the working set being smaller than 
chiueh> the problem size.  In most supercomputer applications working set is the 
chiueh> problem size.  I am sure the architecture of these applications was 
chiueh> influenced by programming for real-memory machines, so this is somewhat of 
chiueh> a circular argument.  However, for the status quo, this is true.

chiueh> Cray's are vector machines with extremely sophisticated instruction 
chiueh> schedulers.  The Cray often has server instructions issued at once in the 
chiueh> same CPU.  X-MPs and Y-MPs scoreboard conflicts between  instructions and 
chiueh> are able to compensate for bank and section memory delays.  These delays 
chiueh> tend to be for one to four cycles.  The instruction scheduler architecture 
chiueh> would be even more difficult if it had to account for page-fault delays of 
chiueh> many thousands of cycles.  An approach to this problem would be to require 
chiueh> the compilers to never allow a vector sub-section to cross a page 
chiueh> boundary.  

chiueh> -- Kent 

chiueh> --------------------------------------------------------------------------------
chiueh> Saw your information request about Crays, and thought that I might be
chiueh> able to point you towards some useful information:

chiueh> I suggest that you check up on Control Data's Cyber 180-series
chiueh> (currently Cyber 2000-series) machines - they are a full hardware
chiueh> Multics implementation, and have some truly "unique" virtual memory
chiueh> hardware. I can personally vouch that the address translation
chiueh> hardware, which also is doing access control checking, is VERY fast,
chiueh> and it has several extra levels of indirectness more than most
chiueh> other folks' virtual memory architectures. Cyber 180 is such a
chiueh> complete Multics that there is actually NO REAL MEMORY ADDRESSING
chiueh> MODE. It is NOT POSSIBLE to access memory by real memory address, the
chiueh> hardware doesn't have the capability!

chiueh> It is also interesting that when a Cyber 180 is emulating Cyber 170
chiueh> mode, it ALSO has base/limit register hardware in operation, since the
chiueh> 170 architecture is real-memory, and only has base/limit restrictions.
chiueh> When a Cyber 180 is running in 170 mode, it really is running a
chiueh> virtual real-memory machine on its virtual memory hardware (just
chiueh> saying this makes my mind feel like a pretzel).

chiueh> If nothing else, the CDC stuff should make interesting counter-culture
chiueh> reading material for you. It was/is truly different.

chiueh> I also suspect that in the Crays (although I have never read the
chiueh> hardware prints of a Cray, only the CDC machines), the bounds checking
chiueh> is being done on the VIRTUAL address, as it were, not the real memory
chiueh> address. This method allowed the old CDC machines (the ones Seymour
chiueh> Cray designed) to do their access checking in the CPU, not the memory
chiueh> controller, and thus kill of the references earlier in the
chiueh> instruction.
chiueh>  
chiueh> -- Gregory 

chiueh> ----------------------------------------------------------------------------
> Furthermore, this check is done for EVERY reference. 
>If this is indeed the case, this protection check process should be as 
>expensive as address mapping in machines that have VM. 

chiueh> Why do you assume this? Given that the latency of Cray memory is 4
chiueh> cycles or so, the check can be done after the address is sent off to
chiueh> memory and can generate a fault before the data gets back.

>So why does Cray get rid of virtual memory altogether ?

chiueh> Well, many supercomputer applications can't page and have to swap. In
chiueh> that case, why provide VM?

chiueh> -- greg

chiueh> In article <1990Dec19.181343.10365 at agate.berkeley.edu> you write:
 > The kind of protection I have in mind is access right control (e.g., read-only)
 > "Normal virtual memory systems" perform this kind of protection check while 
 > doing logical-physical address mapping. The protection bits are either in page 
 > tables or TLB.  Now, since Cray doesn't have virtual memory, the question is 
 > does it provide access control, if so, where does it put this check ?
chiueh> The Cray does not provide extensive access control.  For each running program
chiueh> a (consecutive) part of actual memory is mapped to the logical address space
chiueh> of the program (which starts at 0).  With each reference the logical address
chiueh> is compared to the logical bounds register, and the base register is added
chiueh> to it before going to memory.
 > From the previous responses, it seemed that Cray only provides out-of-bound
 > protection check. Furthermore, this check is done for EVERY reference. 
 > If this is indeed the case, this protection check process should be as 
 > expensive as address mapping in machines that have VM. 
chiueh> Clearly this is much less expensive than true VM; only two registers are needed
chiueh> to do everything (address translation and bound checking), and those two
chiueh> registers reside directly in the CPU.
 > So why does Cray get rid of virtual memory altogether ?  Or does anybody 
 > know how much performance improvement can we gain from getting rid of VM ?
chiueh> This is much less expensive because check and translation go on in parallel
chiueh> within a single clock cycle.

chiueh> -- dik 
--
John D. McCalpin			mccalpin at perelandra.cms.udel.edu
Assistant Professor			mccalpin at brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET