bugfix and extension to slice

Michael Greim greim at sbsvax.UUCP
Tue May 17 19:06:42 AEST 1988


Hello netland,

Some months ago a program called 'slice' was posted comp.sources.unix.
We here tried it, but found a bug almost instantly. Our system
administrator wanted to use slice to extract tar mail pieces from
a mail box so I made 2 extensions of slice for him.

I sent the following to Rich Salz, suggesting a reposting, but did
not hear anything from him. So I assume it's ok, if I present my changes.

Here are
1.) the BUG
2.) 2 extensions to slice, description
3.) context diff of slice.c (a cure for the ailment)
4.) context diff of slice.1
5.) the tarmail extracting script

1.) the BUG

1.1) Symptoms

	I tried "slice -f file -n100 A#n" and exspected slice to produce
	some file Ann. But to my suprise it said : "can not use -n option
	together with pattern" or some such.

1.2) Diagnosis

	The first command line option not starting with '-' was considered
	a pattern regardless of other options specified.

1.3) Therapy

	Apply the context diff in 3.

2.) 2 extensions to slice, description

	The extensions are in the substitution ability.
	#0nn : with this format you can specify up to 99 parameters instead
		of only 9. We needed this!
	#-nn : take the nn'th parameter from the last. nn=0 means the last
		parameter. This is equal to #$ when you have less than 99 parameters.

	NOTE:
		To make this work properly set MAXPARM in opts.h to 99.
		(no context diff include because of {what-you-like} :-)

	Apply the context diff in 3.

3.) context diff of slice.c (a cure for the ailment)

*** slice.c.old	Wed Mar 23 18:41:41 1988
--- slice.c	Wed Mar 23 18:40:42 1988
***************
*** 43,48 ****
--- 44,52 ----
  bool exclude = FALSE;			/* exclude matched line from o/p files */
  bool split_after = FALSE;		/* split after matched line */
  bool m_flag = FALSE;			/* was -m option used */
+ bool s_flag = FALSE;			/* was -s option used */
+ bool n_flag = FALSE;			/* was -n option used */
+ bool e_flag = FALSE;			/* was -e option used */
  
  FILE *output = (FILE *) NULL;	/* fd of current output file */
  FILE *rejectfd = (FILE *) NULL;	/* fd of reject file */
***************
*** 105,110 ****
--- 109,115 ----
  					usage(1);
  				}
  				pattern = *argv;
+ 				e_flag = TRUE;
  				break;
  			}
  			case 'm': {				/* mailbox pattern */
***************
*** 113,119 ****
  				break; 
  			}
  			case 's': {				/* shell pattern */
! 				pattern = "^#! *\/bin\/sh";
  				break; 
  			}
  			case 'n': {				/* -n n_lines -- split every n lines */
--- 118,125 ----
  				break; 
  			}
  			case 's': {				/* shell pattern */
! 				pattern = "^#! *\\/bin\\/sh";
! 				s_flag = TRUE;
  				break; 
  			}
  			case 'n': {				/* -n n_lines -- split every n lines */
***************
*** 123,128 ****
--- 129,135 ----
  					error("-n: number must be at least 1\n");
  					exit(EXIT_SYNTAX);
  				}
+ 				n_flag = TRUE;
  				break;
  			} 
  			case 'f': {
***************
*** 163,179 ****
  		    }
  		}			/* end switch */
  	  } else {	
! 		if (!pattern) pattern = *argv;	/* first non-flag is pattern */
  		else break;						/* break while loop */
  	  }			/* end if */
       }		/* end while */
  
  	 if (!argc) {
! 		if (m_flag) {
  			format = mboxformat;
! 		} else {
  			format = defaultfmt;
- 		}
  		n_format = 1; 
  	 } else {
  		format = argv;
--- 170,195 ----
  		    }
  		}			/* end switch */
  	  } else {	
! 		/*
! 		 * mg, 22.mar.88
! 		 * the first non-flag is pattern, if not one of -s -n or -m
! 		 * was specified or -e pattern
! 		 */
! 		if (!pattern && !m_flag && !s_flag && !n_flag)
! 			pattern = *argv;	/* first non-flag is pattern */
  		else break;						/* break while loop */
  	  }			/* end if */
       }		/* end while */
  
+ 	if (e_flag && (m_flag || s_flag || n_flag)) {
+ 		error("don't use -e  together with -m, -s or -n flags\n");
+ 		usage(EXIT_SEMANT);
+ 	}
  	 if (!argc) {
! 		if (m_flag)
  			format = mboxformat;
! 		else
  			format = defaultfmt;
  		n_format = 1; 
  	 } else {
  		format = argv;
***************
*** 486,491 ****
--- 506,539 ----
  					q += strlen(tempbuf);
  					break;
  				}
+ 				/*
+ 				 * mg, 18.mar.88
+ 				 * - use #0nn to specify parameter numbers greater than 9
+ 				 * - use #-nn to select the nn'th parameter from the last
+ 				 *		#-00 is equivalent to #$
+ 				 */
+ 				case '-':
+ 				case '0':
+ 					if (!isdigit(*(p+1)) || !isdigit(*(p+2))) {
+ 						error("Invalid use of #%cnn format in '%s'\n", *p, *format);
+ 			 			exit(EXIT_RUNERR);
+ 					}
+ 					i = (*(p+1) - '0') * 10 + *(p+2) - '0';
+ 					if (i > MAXPARM) {
+ 						error("Number of parameter (%1d) exceeds max (%1d)\n", i, MAXPARM);
+ 			 			exit(EXIT_RUNERR);
+ 					}
+ 					if (*p == '-') {
+ 						j = lastparm ();
+ 						if (j < i) {
+ 							error ("Not enough parameters to take difference.\n");
+ 							exit (EXIT_RUNERR);
+ 						}
+ 						i = j - i;
+ 					} else
+ 						i--;
+ 					p += 2;
+ 					goto do_form;
  				case '1':
  				case '2':
  				case '3':
***************
*** 501,506 ****
--- 549,555 ----
  					} else {
  						i = (*p) - '1';
  					}
+ do_form:
  					if (*(p+1) == '%') {
  						p++;
  						fmtcode = getfmt(fmt,p);


4.) context diff of slice.1

*** slice.1.old	Wed Mar 23 18:42:28 1988
--- slice.1	Wed Mar 23 18:40:42 1988
***************
*** 38,45 ****
  into one or more output files.  The output files are named according
  to the \fIformat\fR strings provided.  The input file is split
  whenever a pattern is matched or every \fIn\fR lines, depending on the
! options selected.  Because some of the options are mutually exclusive,
  there are three forms of the command.
  .LP
  Whenever a pattern match is used to slice the file, lines occurring
  before the first match are sent to the \fIreject\fR file (which is
--- 38,47 ----
  into one or more output files.  The output files are named according
  to the \fIformat\fR strings provided.  The input file is split
  whenever a pattern is matched or every \fIn\fR lines, depending on the
! options selected.
! Because some of the options are mutually exclusive,
  there are three forms of the command.
+ It is an error to specify a pattern together with options -m, -s or -n.
  .LP
  Whenever a pattern match is used to slice the file, lines occurring
  before the first match are sent to the \fIreject\fR file (which is
***************
*** 111,119 ****
  output file produced by the current output format.  When an output
  format produces the same name twice, a new format is selected and
  numbering begins again with the initial value.
! .IP "#\&1, #\&2 ..."
! Parameters of the form #\&1, #\&2, ... #\&9 are replaced by corresponding
  tokens drawn from the source line which matched the slice pattern.
  For example, if each procedure in a C program began with a comment
  line of the following form:
  .sp
--- 113,129 ----
  output file produced by the current output format.  When an output
  format produces the same name twice, a new format is selected and
  numbering begins again with the initial value.
! .IP "#\&1, #\&2 ..., #\&0nn, #\&-nn"
! Parameters of the form #\&1, #\&2, ... #\&9 or #\&0nn, where 'nn' is
! a 2 digit number are replaced by corresponding
  tokens drawn from the source line which matched the slice pattern.
+ If you specify #\&-nn, you can select a parameter relative from
+ the last token on the line. #\&-00 is the last token on the line,
+ #\&-01 the last but one, ...
+ .br
+ Note that it is an error to not specify two digits when using #\&0nn
+ or #\&-nn.
+ .br
  For example, if each procedure in a C program began with a comment
  line of the following form:
  .sp
***************
*** 131,136 ****
--- 141,149 ----
  \ \ \ \ \From garyp at cognos Tue Sep 15 15:08:23 EDT 1987
  .sp
  then "#$" would select "1987", the last token on the line.
+ .br
+ Currently there are 99 addressable tokens on an input line. If a line
+ is split in more tokens, #$ will hold the last one.
  .SH FORMAT SPEC's
  .LP
  Substitution parameters can be followed by an optional 
***************
*** 240,245 ****
--- 253,264 ----
  generate the correct filenames, either slice has to lookahead to find
  the next match line or it has to direct lines for the current slice
  into a temporary file until it finds the line matching the pattern.
+ .IP c) 4
+ When you use slice on machines with a filesystem which allowes you
+ only a (usually small) amount of characters for filenames (i.e. 14),
+ slice might not detect that it is overwriting a file and/or
+ its diagnostic output is false. Especially filenames generated by the -m
+ option are too long. Just specify a format when slicing a mailbox.
  .SH DIAGNOSTICS
  ``Internal Error'' indicates a bug in \fIslice\fR, and should be reported.
  Exit status 1 indicates an error parsing options \- for example, if an unknown
***************
*** 249,254 ****
--- 268,279 ----
  be opened.
  .LP
  If a reject file is not provided, a count of rejected lines is reported.
+ .SH "AUTHOR"
+ Originally written by Russell Quinn as "mailsplit".
+ .sp
+ Revised and extended by Gary Puckering <cognos!garyp>.
+ .sp
+ Extended some more by Michael Greim.
  .SH "SEE ALSO"
  .I cat (1),
  .I ed (1),


5.) the tarmail extracting script

The author, Bernard Sieloff (bs at sbsvax.UUCP), says it could be improved,
but it is already 30% faster than the version using csplit.


#! /bin/sh
# @(#)untarpack 2.1 (UniSB[bs]) 88/03/20
PATH=/usr/ucb:/bin:/usr/bin:/usr/local
if [ $# -lt 1 -o $# -gt 2 ]; then
	echo "Usage: untarpack \"subject-string\"[ your-tarmailbox]"
	exit 1
fi
trap 'echo "untarpack: cancelled"; exit 9' 1 2 3 15
TS=$1;
if [ $# -eq 2 ]; then
	MB=$2
else
	MB=/usr/spool/mail/$USER
fi
if [ ! -s $MB ]; then
	echo "untarpack: no such file: $MB"
	exit 1
fi
rm -f utm.boxfile.???-of-???
echo "starting unpacking now---please wait..."
sed -n -e "/^Subject: $TS - part/,/^---end beef/p" $MB |
slice "^Subject: $TS - part" 'utm.boxfile.#-02%03d-of-#$%03d'
if [ $? -ne 0 ]; then
	echo "untarpack: slice error"
	exit 2
fi
if [ ! -s utm.boxfile.001-of-??? ]; then
	echo "untarpack: can't find subjects \"$TS\" in file \"$MB\""
	exit 3
fi
FOUND=`ls utm.boxfile.???-of-??? | wc -l`
PACKS=`expr substr utm.boxfile.001-of-??? 20 3`
if   [ $FOUND -lt $PACKS ]; then
	FOUND=`expr $FOUND + 0`
	PACKS=`expr $PACKS + 0`
	echo "untarpack: lack of tarmail packets ($FOUND instead of $PACKS)"
	exit 4
elif [ $FOUND -gt $PACKS ]; then
	FOUND=`expr $FOUND + 0`
	PACKS=`expr $PACKS + 0`
	echo "untarpack: packet overrun?!? ($FOUND instead of $PACKS)"
	exit 5
fi
echo '---end beef' > utm.boxfile.000-of-$PACKS
echo -n "Done---do you want to UNTARMAIL the tarmail? [y/n]:"
read answer junk
answer=${answer}x
if expr $answer : '[yY].*x'>/dev/null; then
	echo "OK---UNTARMAILing your tarmail..."
	exec untarmail utm.boxfile.???-of-???
else
	echo 'Use "untarmail utm.boxfile.???-of-???" to reconstruct the TARMAIL'
fi
exit 0


Absorb, apply and enjoy,

		Michael

-- 

snail-mail : Michael Greim,
			 Universitaet des Saarlandes, FB 10 - Informatik (Dept. of CS),
             Bau 36, Im Stadtwald 15, D-6600 Saarbruecken 11, West Germany
E-mail     : greim at sbsvax.UUCP



More information about the Comp.bugs.4bsd.ucb-fixes mailing list