PANIC - Non-recoverable kernal page fault...

Thu Mar 1 08:53:36 AEST 1990

In article <77688 at tut.cis.ohio-state.edu>, mowgli at sioux.cis.ohio-state.edu (Mowgli Assor) writes:

> The machine we are using is an IBM PS/2 Model 80, 6Meg RAM, ~100Meg HD, w/3
> Digiport smart modem boards. Thursday, we got the following error messages:
> 
> Trap 0000000E in SYSTEM     error = 00000000
	<registers deleted>
>    pc  = 00000020:0001D40D
>    ksp = 060006D0

This part of a panic trap is described (and I have to use that word loosely)
in messages(M). A "Trap 0000000E in SYSTEM" is the trap number given by the
processor, in this case 0x00E (in HEX) which according to Intel's data specs
on the 386 is an "Exception 14 (in decimal) , which is a page fault".
Therefore, from XENIX you get ...

> Panic - Non-recoverable Kernal Page Fault
> 
> Now, the fact that it seems to die with the PC in the same place each time
> makes me very suspicious. Of course, it is likely that only SCO can tell me
> where the OS is dying (as far as what program causes it).

I must agree, SCO should include more info on panic traps with the Run Time
System ...  in fact there is more info included under messages(M) under 2.3.2
and even more about the specific errors in the appendices of the Developement 
System:
	panic:non-recoverable kernel page fault
	(The system could not process a page fault.)

Doesn't tell you much, eh? I'm no guru (yet) but a page fault has to do
with virtual memory processing. According to Intel's data manual on the 386
processor, many things can cause a page fault, the processor will "trap"
to the operating system which is supposed to swap the page of memory back in
from the disk. If this process succeeds you'll not see an error, but if it
doesn't ... well you know that part. See 8.6.1 thru 8.6.3 of the Operations
Guide for more info on when it ain't broke, and how to improve page swapping.

I know this still doesn't explain WHY it is happening in the first place,
the problem could be either hardware or software related. What you need to
do is establish more of a reference point first. When *exactly* does it
happen, what's the system usage when it happens, does it happen only during
certain applications, etc. Also, have you run any read-only surface scan 
of the hard disk? You should be seeing other errors if there is a bad 
sector, even if it is in the swap area of the hard disk. You may also
want to consider using custom(C) to re-install the LINK module so you
can build a known clean kernel ... then re-add the drivers for serial
cards, tape drive , etc.
-- 
I'm just a wanna be UNIX guru (IJWBUG)               | Micro Maintenance, Inc.
						     | 2465 W. 12th St. #6
	   -== Brad Fisher ==- 		             | Tempe, Arizona  85281
     ...!asuvax!mcdphx!hrc!microm		     | 602/894-5526