The Answer to All Man's Problems (part 4 of 6)

Tom Christiansen tchrist at convex.COM
Tue Jan 8 09:22:02 AEST 1991


	Xsection would be recognized, a 
	X.B manq
	Xdirectory would not be, and while a 
	X.B man3f
	Xdirectory would be recognized, a
	X.B man3x11
	Xdirectory would not be.  
	X.PP
	XLikewise, the possible subsections
	Xfor a man page were also embedded in the source code, so 
	Xa man page named something like 
	X.I /usr/man/man3/XmLabel.3x11
	Xwould not be found because 
	X.B 3x11
	Xwas not in the hard-coded list of viable subsections.
	XSome systems install all man pages stripped of subsection
	Xcomponents in the file name.  This situation is less than optimal
	Xbecause it proves useful to be able
	Xto supply both a 
	X.M getc 3f
	Xand a 
	X.M getc 3s .
	XDistinguishing between subsections is 
	Xparticularly convenient with the ``intro'' man pages;
	Xa vendor could supply
	X.M intro 3
	X.M intro 3a ,
	X.M intro 3c ,
	X.M intro 3f ,
	X.M intro 3m ,
	X.M intro 3n ,
	X.M intro 3r ,
	X.M intro 3s ,
	Xand
	X.M intro 3x 
	Xas introductory man pages for the various libraries.
	XHowever, the task of running
	X.M access 2
	Xon all possible subsections is slow and tedious, requiring
	Xrecompilation whenever a new subsection is invented.
	X.NH
	XReferences in the Filesystem
	X.PP
	XThe existing man system had no elegant way to handle
	Xman pages containing more than one entry.  For example, 
	X.M string 3
	Xcontains references to 
	X.M strcat 3 ,
	X.M strcpy 3 ,
	Xamongst others.  Because the \fIman\fP program looks for
	Xentries only in the file system, these extra references must be
	Xrepresented as files that reference the base man page.  The most
	Xcommon practice is to have a file consisting of
	Xa single line
	Xtelling
	X.I troff
	Xto source the other man page.
	XThis file would read something like:
	X.sp
	X.ti 5
	X.CW
	X\&.so man3/string.3
	X.CE
	X.sp 
	XOccasionally,
	Xextra references are created with a link in the file 
	Xsystem (either a hard link or a symbolic one).  Except when 
	Xusing 
	Xhard links, this method wastes
	Xdisk blocks and inodes.  In any case,
	Xthe directory gains more entries, slowing
	Xdown accesses to files in those directories.  Logic 
	Xmust be built into the \fIman\fP program to 
	Xdetect these extra references.
	XIf not, when man pages are reformatted into their 
	Xcat directories, separate formatted man pages are stored 
	Xon disk, wasting substantial amounts of disk space 
	Xon duplicate information.
	XOn systems with numerous man pages, the directories can grow 
	Xso large that all man 
	Xpages for a given section cannot be listed on the command line 
	Xat one time because of kernel restrictions on the total length of the
	Xarguments to 
	X.M exec 2 .
	XBecause of the need to store reference information 
	Xin the file system, the problem is only made worse.
	XThis often happens in 
	Xsection 3 after the man pages for the X 
	Xlibrary have been installed, but
	Xcan occur in other sections as well.
	X.PP
	XThe 
	X.M makewhatis 8
	Xprogram is a Bourne shell script that generates the 
	X.I /usr/lib/whatis
	Xindex, and is used by 
	X.M apropos 1
	Xand 
	X.M whatis 1
	Xto provide one-line summaries of man pages.  These
	Xprograms are part of the 
	X.I man 
	Xsystem
	Xand are often links to each other and sometimes to 
	X.I man
	Xitself.
	XIf any of 
	Xthe man subdirectories contain more files than the shell 
	Xcan successfully expand on the command line, the 
	X.I makewhatis
	Xscript fails
	Xand no index is generated.  When this occurs, 
	X.I whatis
	Xand 
	X.I apropos
	Xstop working.  The 
	X.M catman 8 
	Xprogram, used to pre-format raw man pages, suffers
	Xfrom the same problem.
	X.PP
	XOf course,
	X.I makewhatis
	Xwasn't working all that well, anyway. 
	XIt was a wrapper around many calls to little programs
	Xthat each did a small piece of the work, making it
	Xrun slowly.
	XIt, too, had a hard-coded pathname for where man pages resided
	Xon disk and which sections were permitted.
	X.I Makewhatis
	Xdidn't always extract the proper information 
	Xfrom the man page's \s-1NAME\s0
	Xsection.  When it did, this information was sometimes 
	Xgarbled due to embedded
	X.I troff
	Xformatting information.
	XBut even garbled information was better
	Xthan none at all.  
	XEven so, these programs left some things to be desired.
	X.I Apropos 
	Xdidn't understand regular expression searches, and both
	Xit and 
	X.I whatis
	Xpreferred to do their own lookups using basic, unoptimized C functions
	Xlike 
	X.M index 3
	Xrather than using a general-purpose optimized string search program
	Xlike 
	X.M egrep 1 .
	X.NH
	XThe Solution 
	X.NH 2
	XA Real Database
	X.PP
	XThe problem in all these cases appeared to be that the filesystem
	Xwas being used as a database, and that this paradigm did not hold
	Xup well to expansion.  Therefore the solution was to move
	Xthis information into a database for more rapid access.  
	XUsing this database, 
	X.I man
	Xand 
	X.I whatis
	Xneed no longer call
	X.M access 2
	Xto test all possible locations for the desired man page.
	XTo solve the other problems,
	X.M makewhatis 8
	Xwould be recoded so it didn't rely on the shell 
	Xfor looking at directories.
	X.NH 2
	XCoding in Perl
	X.PP
	XWhen the project was first contemplated, the
	Xperl programming language by Larry Wall was rapidly 
	Xgaining popularity as an alternative to C for tasks that
	Xwere either too slow when written as shell scripts, or
	Xsimply exceeded the shells' somewhat limited capabilities.
	XSince perl was 
	Xoptimized for parsing text, had convenient
	X.M dbm 3x 
	Xsupport built in to it, and the task really didn't seem complex 
	Xenough to merit a full-blown treatment in C or C++,
	Xperl was selected as the language of choice.
	XHaving all code written in perl would also help support 
	Xheterogeneous environments because the resulting scripts could
	Xbe copied and run on any hardware or software platform supporting
	Xperl.  No recompilation would be required.
	X.PP
	XSome concern existed about choosing
	Xan interpreted language when one of the issues to address was 
	Xthat of speed.  It was decided to do the prototype in perl
	Xand, if necessary, translate this into C should performance 
	Xprove unacceptable.
	X.PP
	XThe first task was to recode 
	X.M makewhatis 8
	Xto generate the new
	X.I whatis
	Xdatabase using \fIdbm\fP.  The
	X.M directory 3
	Xroutines were used rather than shell globbing to circumvent
	Xthe problem of large directories breaking shell wildcard
	Xexpansions.  Perl proved to be an appropriate choice for this
	Xtype of text processing (see Figure 1).
	X.BF "\fImakewhatis\fP excerpt #1"
	Xs/\e\ef([PBIR]|\e(..)//g;      # kill font changes
	Xs/\e\es[+-]?\ed+//g;           # kill point changes
	Xs/\e\e&//g;                   # and \e&
	Xs/\e\e\e((ru|ul)/_/g;          # xlate to '_'
	Xs/\e\e\e((mi|hy|em)/-/g;       # xlate to '-'
	Xs/\e\e\e*\e(..//g  &&           # no troff strings
	X    print STDERR "trimmed troff string macro in NAME section of $FILE\en";
	Xs/\e\e//g;                    # kill all remaining backslashes
	Xs/^\e.\e\e"\es*//;              # kill comments
	Xif (!/\es+-+\es+/) {
	X    #   ^ otherwise L-devices would be L
	X    print STDERR "$FILE: no separated dash in $_\en";
	X    $needcmdlist = 1;       # forgive their braindamage
	X    s/.*-//;
	X    $desc = $_;
	X} else {
	X    ($cmdlist, $desc) = ( $`, $' );
	X    $cmdlist =~ s/^\es+//;
	X}
	X.EF
	X.NH 2
	XDatabase Format
	X.PP
	XThe database entries themselves are conveniently
	Xaccessed as arrays from perl.  To save space and
	Xaccommodate man pages with multiple references, two
	Xkinds of database entries exist: direct and indirect.
	XIndirect entries are simply references to direct entries.
	XFor example, indirect entries for 
	X.M getc 3s ,
	X.M getchar 3s ,
	X.M fgetc 3s , 
	Xand 
	X.M getw 3s
	Xall point to the real entry, which is
	X.M getc 3s .
	XIndirect entries are created for multiple entries in 
	Xthe \s-1NAME\s0 section, for symbolic and hard links, and
	Xfor 
	X.B \&.so
	Xreferences.  Using the \s-1NAME\s0 section is the preferred 
	Xmethod; the others are supported for backwards compatibility.
	X.PP
	X.ne 4
	XAssuming that the \s-1WHATIS\s0 array has been bound to the
	Xappropriate
	X.I dbm
	Xfile, storing indirect entries is trivial:
	X.sp
	X.CW	
	X.ti 1i
	X$WHATIS{'fgetc'} = 'getc.3s';
	X.sp
	X.CE
	XWhen a program encounters an indirect entry, such as
	Xfor \fIfgetc\fP, it must make another lookup based on 
	Xthe return value of first lookup (stripped of its 
	Xtrailing extension) until it finds a direct entry.  The
	Xtrailing extension is kept so that an indirect reference
	Xto 
	X.M gtty 3c
	Xdoesn't accidentally pull out 
	X.M stty 1
	Xwhen it really wanted 
	X.M stty 3c .
	X.PP
	XThe format of a direct entry is more complicated, because
	Xit needs to encode the description to be used by 
	X.M whatis 1
	Xas well as the section and subsection information.
	XIt can be distinguished from an indirect entry because
	Xit contains four fields delimited by control-A's (\s-1ASCII 001\s0), 
	Xwhich are themselves prohibited from being in any
	Xof the fields.  The fields are as follows:
	X.br
	X.in +5n
	X.IP 1
	XList of references that point to this man page; this
	Xis usually everything to the left of the hyphen
	Xin the \s-1NAME\s0 section.
	X.IP 2
	XRelative pathname of the file the man page is kept in;
	Xthis is stored for the indirect entries.
	X.IP 3
	XTrailing component of the directory in which the
	Xman page can be found, such as 
	X.B 3
	Xfor \fBman3\fP.  
	X.IP 4
	XDescription of the man page for use by 
	Xthe 
	X.I whatis 
	Xand 
	X.I apropos
	Xprograms; basically everything to the right of the hyphen in the
	XN\s-1AME\s0 section.
	X.in -5n
	X.PP
	XAt first glance, the third field would
	Xseem redundant.  It would appear that you could 
	Xderive it from the character after the dot in the second field.
	XHowever, to support arbitrary subdirectories like
	X.B man3f
	Xor 
	X\fBman3x11\fP, you must also know the name of the 
	Xdirectory so you don't look in
	X.B man3 
	Xinstead.  Additionally, a long-standing tradition exists 
	Xof using the
	X.B mano
	Xsection 
	Xto store old man pages from arbitrary sections.  
	XFurthermore, man pages are sometimes installed in the
	Xwrong section.  To support these scenarios, restrictions
	Xregarding the format of filenames used for man pages were
	Xrelaxed in \fIman\fR,
	X\fImakewhatis\fR, and \fIcatman\fR,
	Xbut warnings would be issued by 
	X.I makewhatis
	Xfor man pages installed in directories that don't have
	Xthe same suffix as the man pages.
	X.NH 2
	XMultiple References to the Same Topic
	X.PP
	XA problem arises from the fact that the same topic 
	Xmay exist in more than one section of the manual.
	XWhen a lookup is performed on a topic,
	Xyou want to retrieve all possible man page locations
	Xfor that topic.  The 
	X.I whatis
	Xprogram wants to display them all to the user, while
	Xthe 
	X.I man
	Xprogram will either show all the man pages 
	X(if the 
	X.B \-a
	Xflag is given) or
	Xsort what it has retrieved according to a particular section and
	Xsubsection precedence, by default showing entries from section
	X1 before those from section 2, and so forth.  Therefore, 
	Xeach lookup may actually return a list of direct and
	Xindirect lookups.  This list is delimited by control-B's
	X(\s-1ASCII 002\s0), which are stripped from the data fields, should
	Xthey somehow contain any.  The code for storing a direct entry
	Xin the 
	X.I whatis
	Xdatabase is featured in Figure 2.
	X.BF "\fImakewhatis\fP excerpt #2"
	Xsub store_direct {
	X    local($cmd, $list, $page, $section, $desc) = @_; # args
	X    local($datum);
	X
	X    $datum = join("\e001", $list, $page, $section, $desc);
	X
	X    if (defined $WHATIS{$cmd}) {
	X        if (length($WHATIS{$cmd}) + length($datum) + 1 > $MAXDATUM) {
	X            print STDERR "can't store $page -- would break DBM\en";
	X            return;
	X        }
	X        $WHATIS{$cmd} .= "\e002";  # append separator
	X    }
	X    $WHATIS{$cmd} .= $datum;  # append entry
	X}
	X.EF
	X.KE
	X.PP
	XNotice the check of the new datum's
	Xlength against the value of \s-1MAXDATUM.\s0  This is because of the
	Xinherent limitations in the implementation of the 
	X.M dbm 3x
	Xroutines.  This is 1k for 
	X.I dbm 
	Xand 4k for
	X.I ndbm .
	XThis restriction will be relaxed 
	Xif a \fIdbm\fR-compatible set of routines is written without 
	Xthese size limitations.  The \s-1GNU\s0 
	X.I gdbm 
	Xroutines hold promise, but they were released after the 
	Xwriting of these programs and haven't been investigated yet.
	XIn practice, these limits are seldom if ever reached, especially 
	Xwhen 
	X.I ndbm 
	Xis used.
	X.NH 
	XOther Problems, Other Solutions
	X.PP
	XThe rewrite of 
	X.I makewhatis ,
	X.I catman ,
	Xand 
	X.I man
	Xto understand multiple man trees and to use a database
	Xfor topic-to-pathname mapping
	Xdid much to alleviate the most important problems
	Xin the existing man system, but several minor problems 
	Xremained.  Since this was a complete rewrite of the entire
	Xsystem, it seemed an appropriate time to address these as well.
	X.NH 2
	XIndexing Long Pages
	X.PP
	XSeveral of the most frequently consulted man pages on the system 
	Xhave grown beyond the scope of a quick reference guide, 
	Xinstead filling the function of a detailed user manual.
	XMan pages of this sort include those for shells, window
	Xmanagers, 
	Xgeneral purpose 
	Xutilities such as awk and perl,
	Xand the \s-1X11\s0 man pages. 
	XAlthough these man pages
	Xare internally organized into sections and subsections that
	Xare easily visible on a hard-copy printout, the on-line 
	Xman system could not recognize these internal
	Xsections.  Instead, the user was forced to search through pages
	Xof output looking for the section of the man page containing
	Xthe desired information.  
	X.PPe
	XTo alleviate this time-consuming tedium, the man program 
	Xwas taught to parse the 
	X.I nroff
	Xsource for man pages in order to build up an index of these sections
	Xand present them to the user on demand.  
	XSee Figure 3 for an excerpt from the 
	X.M ksh 1
	Xindex page, displayable via the new
	X.B \-i 
	Xswitch.
	X.BF "\fIksh\fP index excerpt"
	XIdx  Subsections in ksh.1                   Lines
	X 1   NAME                                       3
	X 2   SYNOPSIS                                  22
	X 3   DESCRIPTION                               15
	X 4   Definitions.                              43
	X 5   Commands.                                338
	X 6   Comments.                                  6
	X 7   Aliasing.                                107
	X 8   Tilde Substitution.                       47
	X 9   Command Substitution.                     28
	X10   Process Substitution.                     49
	X11   Parameter Substitution.                  645
	X12   Blank Interpretation.                     15
	X13   File Name Generation.                     87
	X.EF
	X.PP
	XThe 
	X.I /usr/man/idx*/
	Xdirectories
	Xserve the
	Xsame function for saved indices
	Xas
	X.I /usr/man/cat*/
	Xdirectories do for saved formatted man pages.
	XThese are regenerated as needed according the 
	Xthe same criteria used to regenerate the cat pages.
	XThey can be used to index into a given man page or
	Xto list a man page's subsections.  
	XTo begin at a given subsection, the user appends
	Xthe desired subsection to the name of the man page
	Xon the command line,
	Xusing a forward slash as a delimiter.   Alternatively, 
	Xthe user can just supply a trailing slash on the man page
	Xname, in which case they are presented with the index listing
	Xlike the one the
	X.B \-i
	Xswitch provides, then prompted for the section 
	Xin which they are interested.  A double slash indicates
	Xan arbitrary regular expression, not a section name.
	XThis is merely a short-hand notation for first running
	Xman and then typing 
	X.CW
	X/expr
	X.CE 
	Xfrom within the user's pager.
	XSee Figure 4
	Xfor example usages of the indexing features.  
	X.BF "Index Examples"
	Xman -i ksh      # show sections
	Xman ksh/        # show sections, prompt for which one
	X
	Xman ksh/tilde
	Xman ksh/8       # equivalent to preceding line
	X
	Xman ksh/file
	Xman ksh/generat # equivalent to preceding line
	Xman ksh/13      # so is this
	X
	Xman ksh//hangup # start at this string
	X.EF
	X.PP
	XThis indexing scheme is implemented by searching the index stored in 
	X.I /usr/man/idx1/ksh.1
	Xif it exists, or generated dynamically otherwise,
	Xfor the requested subsection.  A numeric subsection is
	Xeasily handled.  For strings, a case-insensitive
	Xpattern match is first
	Xmade anchored to the front of the string, then \(em failing
	Xthat \(em anywhere in the section description.  This way
	Xthe user doesn't need to type the full section title.
	XThe 
	X.I man 
	Xprogram starts up the pager with a 
	Xleading argument to begin at that section.  Both
	X.M more 1
	Xand 
	X.M less 1
	Xunderstand this particular notation.
	XIn the first
	Xexample given above, this would be
	X.sp
	X.CW
	X.ti +.5i
	Xless '+/^[ \et]*Tilde Substitution' /usr/man/cat1/ksh.1
	X.sp
	X.CE
	X.PP
	XOnce again, perl proved 
	Xuseful for coding this algorithm concisely.  The 
	Xsubroutine for doing this is given in 
	XFigure 5.  Given an expression such as ``5''
	Xor ``tilde'' or ``file'' and a pathname of the man 
	Xpage,
	X.I man
	Xloads
	Xan array of subsection
	Xindex titles and quickly retrieves the proper
	Xheader to pass on to the pager.  Perl's built-in 
	X.B grep
	Xroutine for selecting from arrays those elements 
	Xconforming to certain criteria made the coding easy.
	X.BF "Locate Subsection by Index"
	Xsub find_index {
	X    local($expr, $path) = @_;  # subroutine args
	X    local(@matches, @ssindex);
	X    @ssindex = &load_index($path);
	X
	X    if ($expr > 0) {            # test for numeric section
	X        return $ssindex[$expr];
	X    } else {
	X        if (@matches = grep (/^$expr/i, @ssindex)) {
	X            return $matches[0];
	X        } elsif (@matches = grep (/$expr/i, @ssindex)) {
	X            return $matches[0];
	X        } else {
	X            return '';
	X        }
	X    }
	X}
	X.EF
	X.NH 2
	XConditional Tbl and Eqn Inclusion
	X.PP
	XSeveral other relatively minor enhancements were made 
	Xto the man system in the course of its rewrite.  
	XOne of these
	Xwas to include calls to 
	X.M eqn 1
	Xand 
	X.M tbl 1
	Xwhere appropriate.  For instance, the \s-1X11\s0 man pages use 
	X.I tbl
	Xdirectives to construct a number of tables.
	XIt was not sufficient to supply 
	Xthese extra filters for all man pages.  Besides the
	Xslight performance degradation this would incur, a 
	Xmore serious problem exists: some systems have man pages that 
	Xcontain embedded
	X.LB .TS
	Xand 
	X.LB .TE
	Xdirectives; however, the data between them was not
	X.I tbl 
	Xinput, but rather its output.  They have already 
	Xbeen pre-processed in the unformatted versions.
	XTo do so again causes 
	X.I tbl 
	Xto complain bitterly, so heuristics to check for this condition
	Xwere built in to the function that determines which filters 
	Xare needed.
	X.PP
	XTo support tables and equations in man pages when viewed on-line,
	Xthe output must be run through
	X.M col 1
	Xto be legible.  Unfortunately, this strips the man pages
	Xof any bold font changes, which is undesirable because it is 
	Xoften important to distinguish between bold and italics for 
	Xclarity.  Therefore, before the formatted man page is fed to 
	X\fIcol\fP, all text in bold (between escape sequences)
	Xis converted to character-backspace-character combinations.  These
	Xcombinations
	Xcan be recognized by the user's pager as a character in 
	Xa bold font, just as underbar-backspace-character is recognized
	Xas an italic (or underlined) one.  Unfortunately, while 
	X.I less
	Xdoes recognize this convention, 
	X.I more
	Xdoes not.  By storing the formatted versions with all escape-sequences
	Xremoved, the user's pager can be invoked without a pipe to 
	X.I ul 
	Xor
	X.I col
	Xto fix the reverse line motion directives.  This provides the pager with
	Xa handle on the pathname of the cat page, allowing users to back up
	Xto the start of man pages, even exceptionally long ones, without exiting the 
	X.I man 
	Xprogram.  This would not be feasible if the pager were being fed
	Xfrom a pipe.
	X.NH 2
	XTroffing and Previewing Man Pages
	X.PP
	XNow that many sites have high-quality laser printers
	Xand bit-mapped displays, it seemed desirable for 
	X.I man
	Xto understand how to direct 
	X.I troff
	Xoutput to these.  A new option, \fB-t\fR,
	Xwas added to mean that 
	X.I troff 
	Xshould be used instead of 
	X\fInroff\fR.
	XThis way users can easily get pretty-printed versions of
	Xtheir man pages.
	X.PP
	XFor workstation or X-terminal users,
	X.I man
	Xwill recognize
	Xa \s-1TROFF\s0 environment variable or 
	Xcommand line argument to indicate an 
	Xalternate program to use for typesetting.  
	X(This presumes that the program recognizes 
	X.I troff
	Xoptions.)  This method often produces more legible output
	Xthan 
	X.I nroff
	Xwould, allows the user to stay in their office, and saves
	Xtrees as well.
	X.NH 2
	XSection Ordering
	X.PP 
	XThe same topic can occur in more than one section of 
	Xthe manual, but
	Xnot all users on the system want the same default
	Xsection ordering that 
	X.I man 
	Xuses to sort these possible pages.
	XFor instance,
	XC programmers who want to look up the man page for
	X.M sleep 3
	Xor 
	X.M stty 3
	Xfind that by default, 
	X.I man 
	Xgives them 
	X.M sleep 1
	Xand
	X.M stty 1
	Xinstead.  A \s-1FORTRAN\s0 programmer may want to see
	X.M system 3f ,
	Xbut instead gets 
	X.M system 3 .
	XTo accommodate these needs, the 
	X.I man 
	Xprogram will honor a \s-1MANSECT\s0 environment 
	Xvariable (or a 
	X.B \-S 
	Xcommand line switch) containing a list of section suffixes.
	XIf subsection or multi-character section ordering 
	Xis desired, this string should be colon-delimited.
	XThe default ordering is ``ln16823457po''.  
	XA C programmer might set his \s-1MANSECT\s0 to be ``231'' instead to access
	Xsubroutines and system calls before commands of the same name.
	XA \s-1FORTRAN\s0 programmer might prefer ``3f:2:3:1'' to get
	Xat the \s-1FORTRAN\s0 versions of subroutines before the standard
	XC versions.
	XSections absent from the \s-1MANSECT\s0 have a sorting priority 
	Xlower than any that are present.
	X.NH 2
	XCompressed Man Pages
	X.PP
	XBecause man pages are \s-1ASCII\s0 text files, they stand to benefit from 
	Xbeing run through the 
	X.M compress 1
	Xprogram.
	XCompressing man pages 
	Xtypically yields disk space savings of around 60%.
	XThe start-up time for decompressing the man page when 
	Xviewing is not enough to be bothersome.  However, running
	X.I makewhatis
	Xacross compressed man pages takes significantly longer
	Xthan running it over uncompressed ones, so some sites may wish to 
	Xkeep only the formatted pages compressed, not the unformatted
	Xones.
	X.PP
	XTwo different
	Xways of indicating compressed man pages seem to exist
	Xtoday.  One is where the man page itself has an attached
	X.B .Z 
	Xsuffix, yielding pathnames like
	X\fI/usr/man/man1/who.1.Z\fR.  
	XThe other way is to have 
	Xthe section directory contain the 
	X.B .Z 
	Xsuffix
	Xand have the files named normally, as in 
	X\fI/usr/man/man1.Z/who.1\fR.  
	XEither strategy is supported to ease porting 
	Xthe program to other systems.
	XAll programs dealing with man pages have been updated to 
	Xunderstand man pages stored in compressed form.
	X.NH 2
	XAutomated Consistency Checking
	X.PP
	XAfter receiving a half-dozen or so bug reports regarding 
	Xnon-existent man pages referenced in \s-1SEE\s0 \s-1ALSO\s0 sections,
	Xit became apparent that the only way to verify that all
	Xbugs of this nature had really been expurgated would be to automate the process.
	XThe 
	X.I cfman
	Xprogram
	Xverifies that man pages
	Xare mutually consistent in their \s-1SEE\s0 \s-1ALSO\s0 references.  It
	Xalso reports man pages whose
	X.LB .TH 
	Xline claims the man page is in
	Xa different place than 
	X.I cfman 
	Xfound it.  
	X.I Cfman
	Xcan locate man pages
	Xthat are improperly referenced rather than merely missing.  It 
	Xcan be run on an entire man tree, or on individual files as 
	Xan aid to developers writing new man pages.
	X.BF "Sample \fIcfman\fP run"
	Xat.1: cron(8) really in cron(1)
	Xbinmail.1: xsend(1) missing
	Xdbadd.1: dbm(3) really in dbm(3x)
	Xksh.1: exec(2) missing
	Xksh.1: signal(2) missing
	Xksh.1: ulimit(2) missing
	Xksh.1: rand(3) really in rand(3c)
	Xksh.1: profile(5) missing
	Xld.1: fc(1) really in fc(1f)
	Xsccstorcs.1: thinks it's in ci(1)
	Xuuencode.1c: atob(n) missing
	Xyppasswd.1: mkpasswd(5) missing
	Xfstream.3: thinks it's in fstream(3c++)
	Xftpd.8c: syslog(8) missing
	Xnfmail.8: delivermail(8) missing
	Xversatec.8: vpr(1) missing
	X.EF
	X.PP
	XThe amount of output produced by 
	X.I cfman 
	Xis startling.
	XA portion of the output of a sample run 
	Xis seen in Figure 6.
	XSome of its complaints are relatively harmless, such as
	X.I dbm
	Xbeing in section 
	X.B 3x
	Xrather than section 
	X\fB3\fR, because the 
	X.I man 
	Xprogram can find entries with the subsection left off.
	XHaving inconsistent
	X.LB .TH
	Xheaders is also harmless, although the printed
	Xman pages will have headers that do not reflect their
	Xfilenames on the disk.
	XHowever, entries that refer to pages that are truly absent, like
	X.M exec 2
	Xor 
	X.M delivermail 8 ,
	Xmerit closer attention.
	X.NH 2
	XMultiple Architecture Support
	X.PP
	XAs mentioned in the discussion of the need for a \s-1MANPATH\s0, 
	Xa site may for various reasons wish to maintain several 
	Xcomplete sets of man pages on the same machine.  Of course,
	Xa user could know to specify the full pathname of the 
	Xalternate tree on the command line 
	Xor set up their environment appropriately, but this is
	Xinconvenient.  Instead, it is preferable
	Xto specify the machine type on the command line and let
	Xthe system worry about pathnames.  
	X.ne 5
	XConsider these examples:
	X.br
	X.CW
	X.nf
	X.na
	X.in +.5i
	Xman vax csh
	Xapropos sun rpc
	Xwhatis tahoe man
	X.in -.5i
	X.CE
	X.ad 
	X.fi
	X.PP 
	XTo implement this, 
	Xwhen presented with more than one argument,
	X.I man
	X(in any of its three guises)
	Xchecks to see whether the first non-switch argument
	Xis a directory beneath
	X.I /usr/man .  
	XIf so, it automatically adjusts its \s-1MANPATH\s0 to that subdirectory.
	X.PP 
	XNot all vendors use precisely the same set of 
	X.M man 7
	Xmacros for formatting their man pages.  Furthermore, it's 
	Xhelpful to see in the header of the man page which manual
	Xit came from.  The 
	X.I man 
	Xprogram therefore looks for a local 
	X.I tmac.an
	Xfile in the root of the current man tree for alternate macro
	Xdefinitions.  If this file exists, it will be used rather than
	Xthe system defaults for passing to 
	X.I nroff
	Xor 
	X.I troff
	Xwhen reformatting.
	X.NH 
	XPerformance Analysis
	X.PP
	XThe 
	X.I man
	Xprogram is one that is often used on the system, 
	Xso users are sensitive to any significant degradation
	Xin response time.  Because it is written in perl (an 
	Xinterpreted language) this was cause for concern.
	XOn a \s-1CONVEX C2\s0, the C version runs faster when only
	Xone element is present in the \s-1MANPATH\s0.
	XHowever, when the \s-1MANPATH\s0 contains four
	Xelements, the C version bogs down considerably because of
	Xthe large number of 
	X.M access 2
	Xcalls it must make.  
	X.PP
	XThe start-up time on the parsing
	Xof the script, now just over 1300 lines long, is around
	X0.6 seconds.  This time can be reduced by dumping the 
	Xparse tree that perl generates to disk and executing that instead.
	XThe expense of this action is disk space, as the current implementation
	Xrequires that the whole perl interpreter be included in the 
	Xnew executable, not just the parse tree.  This method
	Xyields performance superior to that of the C version,
	Xirrespective of the number of components in the user's \s-1MANPATH\s0,
	Xexcept occasionally on the initial run.  This is because the 
	Xprogram needs to be loaded
	Xinto memory the first time.  If perl itself is installed ``sticky''
	Xso it is memory resident, start-up time improves considerably.  
	XIn any case, the 
	Xtotal variance (on a \s-1CONVEX\s0) is 
	Xless than two seconds in the worst case (and often 
	Xunder one second), so it was deemed acceptable, particularly
	Xconsidering the additional functionality the perl version offers.
	X.PP
	XNothing in the algorithms employed in the
	X.I man 
	Xprogram require that it be written in perl;
	Xit was just easier this way.  It could be rewritten in C 
	Xusing 
	X.M dbm 3x
	Xroutines, although the development time would probably 
	Xbe much longer.  
	X.PP
	XThe 
	X.I makewhatis
	Xprogram was originally a conglomeration of man calls to various individual
	Xutilities such as 
	X\fIsed\fP,
	X\fIexpand\fP,
	X\fIsort\fP, and others.  The perl rewrite runs in less than half the time
	Xof the original, and does a much better job.  There are two
	Xreasons for the speed increase.  The first is the cost of the numerous 
	X.M exec 2
	Xcalls made via the shell script used by the old version of 
	X.I makewhatis .
	XThe second is that 
	Xperl is optimized for text processing, which is most of what
	X.I makewhatis
	Xis doing.
	X.PP
	XTotal development time was only a few weeks, 
	Xwhich was much shorter than originally anticipated.  The short
	Xdevelopment cycle was chiefly attributable to
	Xthe ease of text processing in perl, the many built-in 
	Xroutines for doing things that in C would have required 
	Xextensive library development, and, last but not at all least,
	Xthe omission of the compilation stage in the normal edit-compile-test
	Xcycle of development when working with non-interpreted languages.
	X.NH
	XConclusions
	X.PP
	XThe system described above has been in operation for the last
	Xsix months on a large local network consisting of three dozen 
	X\s-1CONVEX\s0 machines, a token \s-1VAX\s0, quite a few \s-1HP\s0 workstations
	Xand servers, and innumerable Sun workstations, all running different
	Xflavors of \s-1UNIX\s0.  Despite this heterogeneity,
	Xthe same code runs on all systems without alterations.
	XFew problems have been seen, and those that did arise were quickly
	Xfixed in the scripts, which could be immediately redistributed
	Xto the network.  The principal project goals of improved functionality, 
	Xextensibility, and execution time were adequately met, and the 
	Xexperience of rewriting a set of standard \s-1UNIX\s0 utilities
	Xin perl was an educational one.
	XMan pages stand a much better chance of being internally consistent
	Xwith each other.
	XResponse from the user and development community has 
	Xbeen favorable. They have
	Xbeen relieved by the many bug fixes and pleasantly surprised
	Xby the new functionality.  The suite of man programs will replace
	Xthe old man system in the next release of \s-1CONVEX\s0 utilities.
	X.\" Should be .BB here but that seems to mutilate my last BF figure
	X.sp 3
	X.QP
	X.I 
	X.SM
	XTom Christiansen left the University of Wisconsin with an \s-1MS-CS\s0
	Xin 1987
	Xwhere he had been a system administrator for 6 years to join
	X\s-1CONVEX\s0
	XComputer Corporation in Richardson, Texas.
	XHe is a software development engineer
	Xin the Internal Tools Group there, designing software tools
	Xto streamline software development and systems administration
	Xand to improve overall system security.
	X.BE
SHAR_EOF
if test 34978 -ne "`wc -c < 'man.ms'`"
then
	echo shar: "error transmitting 'man.ms'" '(should have been 34978 characters)'
fi
chmod 664 'man.ms'
fi
echo shar: "extracting 'COPYING'" '(151 characters)'
if test -f 'COPYING'
then
	echo shar: "will not over-write existing file 'COPYING'"
else
sed 's/^	X//' << \SHAR_EOF > 'COPYING'
	X#	You are free to use, modify, and redistribute these scripts
	X#	as you wish for non-commercial purposes provided that this 
	X#	notice remains intact.  
SHAR_EOF
if test 151 -ne "`wc -c < 'COPYING'`"
then
	echo shar: "error transmitting 'COPYING'" '(should have been 151 characters)'
fi
chmod 664 'COPYING'
fi
echo shar: "extracting 'man'" '(39119 characters)'
if test -f 'man'
then
	echo shar: "will not over-write existing file 'man'"
else
sed 's/^	X//' << \SHAR_EOF > 'man'
	X#!/usr/local/bin/perl 
	X# 
	X# man - perl rewrite of man system
	X# tom christiansen <tchrist at convex.com>
	X#
	X# Copyright 1990 Convex Computer Corporation.
	X# All rights reserved.
	X#
	X# --------------------------------------------------------------------------
	X# begin configuration section
	X#
	X# this should be adequate for CONVEX systems.  if you copy this script 
	X# to non-CONVEX systems, or have a particularly outre local setup, you may
	X# wish to alter some of the defaults.
	X# --------------------------------------------------------------------------
	X
	X$PAGER = $ENV{'PAGER'} || 'more';
	X
	X# assume "less" pagers want -sf flags, all others must accept -s.
	X# note: some less's prefer -r to -f.  you might also add -i if supported.
	X#
	X$is_less = $PAGER =~ /^\S*less(\s+-\S.*)?$/;
	X$PAGER    .= $is_less ? ' -si' : ' -s';       # add -f if using "ul"
	X
	X# man roots to look in; you would really rather use a separate tree than 
	X# manl and mann!  see %SECTIONS and $MANALT if you do.
	X$MANPATH  = &config_path;
	X
	X# default section precedence
	X$MANSECT  = $ENV{'MANSECT'} || 'ln16823457po';
	X
	X# colons optional unless you have multi-char section names
	X# note that HP systems want this:
	X#	$MANSECT  = $ENV{'MANSECT'} || '1:1m:6:8:2:3:4:5:7';
	X
	X# alternate architecture man pages in 



More information about the Alt.sources mailing list