Swap questions

David Hinds dhinds at elaine23.stanford.edu
Sat Dec 1 13:16:08 AEST 1990


In article <1539 at contex.UUCP> james at contex.UUCP (James McQueston) writes:
>
>Example: your server is used to run a simulation that takes hours or days to
>compute, and you have tuned the size of your finite-element mesh to just
>barely fit within the capabilities of that machine.  N hours later, someone
>else innocently runs some unimportant program on the server and causes page
>deadlock.  The O.S. blindly decides which process to kill and ... pow!  Chance
>determines that the simulation gets killed and you lose N hours of work.
>Too bad that the other user was just checking his mail.

    We had a bad thing happen yesterday that I think was a result of this
problem.  My advisor has written a graphics program for manipulating the
results of protein molecular dynamics calculations, that reads entire
dynamics trajectories into memory.  It is written in Fortran, and has huge
static zero-initialized data areas - it takes about 48MB of virtual memory.
We have 32MB of main memory and 48MB of swap space presently.  Yesterday,
someone started up this program and started reading in an MD dataset, and
walked away.  When she came back, the machine was apparently deceased.  The
mouse cursor could still move around the screen, but the buttons and console
keyboard were useless.  We couldn't get any response from the system over
the network.  We had to power down to reset things, and I lost a simulation
that had logged about 120 hours of CPU time.  I can only guess that when
the virtual memory limit was reached, something important was killed that
crippled the system.  This was under 3.3.1, by the way.

 -David Hinds
  dhinds at cb-iris.stanford.edu



More information about the Comp.sys.sgi mailing list