various boot-related questions.

Thu Sep 14 01:28:11 AEST 1989

In article <89Sep12.214425edt.2245 at neat.cs.toronto.edu>, lamy at ai.utoronto.ca (Jean-Francois Lamy) writes:
> People on the phone on the hotline are giving me blank stares :-) when
> asked the following:
> 
> a) on an SGI 4D/240 using a tty as a console, how does one forcibly enter
>    the PROM monitor? (this is often accomplished by BREAK on ttys).  While we
>    understand this may not be desirable by default, there oughta be a way
>    to make the machine pay attention without having to depress its belly
>    button...

I'm not surprised you get a (over-the-phone) blank stare on this one.  The
PROM monitor is only active when the OS isn't.  There isn't any magic key
to press to get into it, although the magic key sequence "init 0" will 
get you there - without UNIX :-).

To get more complex, you can get into the built in kernel debugger by
lboot'ing a system with 'idbg' INCLUDE'd.  See /usr/sysgen, and edit system
to INCLUDE idbg.  Then execute /etc/init.d/autoconfig and reboot.  This will
also allow you to use the IRIX command 'idbg' to poke around kernel
data structures.  As a warning, though, don't call the Hotline if you have
problems with playing around with this - it's not in the normal operating
mode of most places.

> 
> b) How does one do the equivalent of a "savecore" upon reboot (the goal is to
>    be able to save memory state and peek around to see where things hang).
> 
>    Since someone will ask why on earth we'd want to do that - we
>    have a machine that runs for several days and all of the sudden refuses
>    to fork off new commands (i.e. after typing a command to the shell, all
>    you can do is interrupt it; you can connect to the telnet daemon but
>    won't get a shell; ping replies do get answered, etc.).  There is plenty
>    of memory and plenty of swap space and plenty of process slots.

I haven't heard of this problem running released software for the 240.  Be
sure you are running 3.1F, or preferably 3.1G.  The 3.1D release will run
a 240, but only poorly, and it has the bug you mention.  Even better, bug
your SE to get 3.2 (see below).

The system ALREADY does a 'savecore' on reboot if it crashed before.  It
doesn't work on a hang, since reset has to be pushed to get back.  Installing
the debugger as above, and then using the '^A' key on the console will drop
you into the debugger, from which you can get a stack trace and poke around
the kernel data structures.  Of course, you will recognize little, since
IRIX is a V.3 kernel with many 4.3 extensions re-written for a
multiprocessor ...

> And while I'm at it, the whole fsck picture makes me and a bunch more people
> shudder.
> 
> c) What exactly are the "minor repairs" where fsck -b will seek to reboot?
> d) What exactly is checked by fsstat? Is the dirty bit just that, a bit, that
>    could, say happen to be in a bad block and be wrong?  Is fsck under efs
>    guaranteed to clean the filesystem in one pass, or does it suffer from
>    the berkeley heritage and therefore sometimes two or three fscks of the
>    same partition are required before fsck succeeds twice in a row?
>    Why did a message arrive in my mailbox this very second suggesting we
>    comment out all the calls to fsstat in the boot sequence and get rid of
>    the -c flags in mountall?
> e) Can we get a -p flag on fsck (i.e. repair minor damage, report error
>    on major damage, causing machine to stay in single user mode).  Using
>    -y in the boot sequence sounds like a bit trusting.

The 3.2 release re-mounts the root filesystem instead of rebooting.  The
dirty bit is kept in the superblock, and is set dirty unless the filesystem
is unmounted, in which case it is set clean.  Fsck WILL clean the filesystem
in one pass.  And I'd ignore the mail message, unless you'd like even more
pain and suffering in a few days.  Finally, you are certainly welcome to
remove the '-y' option, but remember that all us UNIX guru's out here rely
on it.  After all, how many people have the sophistication to understand
when to say 'no' to fsck?  (There are maybe two or three people out here).
EFS is pretty robust, and getting better, and I haven't heard of data losses
after a crash being a big problem.

> Brrrrrr.
> 
> Jean-Francois Lamy               lamy at ai.utoronto.ca, uunet!ai.utoronto.ca!lamy
> AI Group, Department of Computer Science, University of Toronto, Canada M5S 1A4

-- Jim Barton
Silicon Graphics Computer Systems    "UNIX: Live Free Or Die!"
jmb at sgi.sgi.com, sgi!jmb at decwrl.dec.com, ...{decwrl,sun}!sgi!jmb