Help! Altos 5.3.1 fork is failing!

Jim Rosenberg jr at oglvee.UUCP
Fri Oct 20 02:14:29 AEST 1989


In article <2296 at hcr.UUCP> larry at zeus.UUCP (Larry Philps) writes:
>Getcpages, is indeed get "contiguous" physical pages.  There are parts of the
>paging system on some processors that require this.  The complaint about a
>failure on 1 page simply means that ALL RAM was being used when the fork
>appeared and the system needed a page to hold page tables or the like.
>
>Now, for some reason unknown to me, in fork (procdup actually), dupreg is
>called with arguments that specify that it is not to sleep.  I couldn't come
>up with any sensible reason why this had to be, so I changed the call to
>allow sleeps.  The fork failure problems simply went away, and no other
>problems manifested.

OK, kernel gurus, what's the word:  *is there* a good reason why the call to
dupreg shouldn't sleep???

We are also running V.3.2 on a bunch of AT&T 6386en.  Those machines have only
2M RAM.  I know damn well that we're just on the borderline of what's doable
with that little memory -- it's a budget issue, not a technical issue.
Although I do often suffer from the overhead of paging, I've *NEVER* seen a
fork failure on these machines.  Admittedly this is V.3.2 and not V.3.1.  But
I wonder if AT&T did go ahead and change the dupreg call to allow a sleep.
Can someone from AT&T comment?

I must say this, though:  while I've never seen an identifiable fork failure
on one of the 6386en, I *have* seen a phenomenon which I call Kernel
Narcolepsy: the whole system just seems to fall asleep now and then.  I had
one machine a couple of months ago that had an extremely sick disk.  To make
sure another machine didn't have the problem I intentionally loaded it with
enough continuous compiles of our database language (Progress) to cause solid
thrashing.  Every now and then the thrashing would just stop.  After about 5
minutes it would pick up again.  I don't know for a fact that it was really
sleeping:  it could have been a kind of beat frequency where the processes
just happened to hit on the same pages.  But I did suffer one definite case
where the whole system went to sleep and even though characters would echo I
could get no response from any getty and the system was definitely just plain
stuck.  This took a full reboot, fsck found minor damage, etc. etc.

So I guess the question is this:  If the dupreg call from fork allows sleeps,
could this lead to a deadlock?  Is it possible I may be seeing this on V.3.2?

If the dupreg call *can be* safely changed to allow sleeping then my Altos
problem is a flat out case of a bug in their System V.3.1.  If it *can't*
safely be changed, then as I understand the situation V.3 DOES NOT RELIABLY
IMPLEMENT VIRTUAL MEMORY!!  Is it not true that pages are freed by an
asynchronous kernel process?  Is it not true that, given the indeterminate way
things work in UNIX, one cannot absolutely guarantee when this process will
run?  If you can't allow a sleep from fork in dupreg then the only way of
guaranteeing that fork won't fail is to guarantee that you don't page.  I.e.
if you page, you run a certain risk that forks will fail no matter how much
swap space you have.  The only way to guarantee fork will never fail is to
guarantee you don't page.  I.e. don't really exercise virtual memory.  I.e.
V.3 virtual memory is NOT RELIABLE because if you use it you may trigger fork
failures.

Please tell me it ain't so!!!!!
-- 
Jim Rosenberg                        pitt
Oglevee Computer Systems                 >--!amanue!oglvee!jr
151 Oglevee Lane                      cgh
Connellsville, PA 15425                                #include <disclaimer.h>



More information about the Comp.unix.i386 mailing list