Cron bug on the SS-1 under 4.0.3c [summary, long]

ajudge at maths.tcd.ie ajudge at maths.tcd.ie
Fri Feb 9 20:21:54 AEST 1990


Here is a summary of the replies I have received about a cron bug which
causes some cron jobs to be run twice.

The bug is acknowledged by Sun and a patch is available, but even after
the patch the problem still recurs.

>>> cron:1

X-From:      bengts at Sweden.Sun.com

It's a bug in the cron program, bugid #1022379. You can get a new cron
program from your local answercentre.  This new cron also works on the
4/390 but not on the 4/330.  On the 4/330 you patch a value in the kernel.

>>> cron:3

X-From:      jay at silence.princeton.nj.us

Yes, this is a known problem.  It affects all Suns (bug in the SysV
version of cron in SunOS) but it bites the 4/60 and the 386i more than
others because of some kernel workaround for a hardware problem (the
details I've forgotten).  On the 4/60, I believe that the problem is
partly that the clock is frequently reset.  Sun has supplied fixes for
some architectures but not, to my knowledge, for the 4/60.  If you run ntp
the problem will become even more severe.

The only current workaround (necessary even with Suns patch if you run
ntp) is to wrap each cron job with a shell script or program which creates
a lockfile to prevent duplicate invocations.

Here's an example locker.  Fancier than anything you really need, but you
can weed out the cruft:
*** cut ***
#! /bin/csh -f

# Prevent cron from executing jobs twice

unset MAILTO
set JOBSHELL = "/bin/sh -c"

goto start
usage:
echo Usage: `basename $0` '[options] lockname command ...\
Options:\
	-m mailto	mail output to "mailto"\
	-s shell	execute command with "shell"\
	-c		execute command with "csh -c"\
	-C		execute command with "csh -cf"'
exit

start:
set CMD = "$0 $*"
set parsing = 1
while ( $parsing )
	if ( $#argv < 2 ) then
		goto usage
	endif
	switch ( x$1 )
		case x-m:
			set MAILTO = "$2"
			shift; shift
			breaksw
		case x-s:
			set JOBSHELL = "$2"
			shift; shift
			breaksw
		case x-c:
			set JOBSHELL = "/bin/csh -c"
			shift
			breaksw
		case x-C:
			set JOBSHELL = "/bin/csh -cf"
			shift
			breaksw
		case x-*:
			goto usage
			breaksw
		default:
			set parsing = 0
			breaksw
	endsw
end

set LOCK = /tmp/$1.cronlock.$LOGNAME
echo $$ > $LOCK
sleep 60

set OUT = /tmp/$1.$$.$LOGNAME
touch $OUT
chmod 600 $OUT
shift

if ( -e $LOCK ) then
	if ( x$$ == x`cat $LOCK` ) then
		$JOBSHELL "$*" >& $OUT
		rm -f $LOCK
		goto wrapup
	endif
endif
echo "Passing the buck." > $OUT

wrapup:
if ( ! -z $OUT ) then
	if ( $?MAILTO ) then
		/usr/ucb/Mail -s "Cron job (`hostname`): $CMD" "$MAILTO" < $OUT
	else
		echo "Cron job (`hostname`)"
		cat $OUT
	endif
endif
rm -f $OUT

>>> cron:6

X-From:      alex <alexl%daemon.cna.tek.com at RELAY.CS.net>

> From: Ed Anselmo <anselmo-ed at yale.edu>
> Subject: Re: cron running twice
>
> Sun is offering a patched version of cron.  Part of the README file follows:
>
> Bugs Fixed:
> ------------
> 1.  cron.c:
>     1019719:  print at(1) job number in syslog messages
>     1023418:  cron queue handling and scheduling is broken
>     1012011:  Initialize USER as well as LOGNAME environment variable
>     1017698:  cron sends erroneous error message when job can't be executed
>     1014181:  add pid and queue name to the CMD syslog message
>     1012398:  "cron"/"at"/"batch" runs more jobs than queue limit
>     1022379:  cron executes crontab entries twice  (duplicate of 1027075)
>
> 2.  funcs.c:
>     1011113:  invalid sys_errlist message number is >= sys_nerr, not >
 sys_nerr
>
> (We received this through the standard support channels, i.e. hotline at sun.com
)
> --
> Ed Anselmo   anselmo-ed at cs.yale.edu   {harvard,decvax}!yale!anselmo-ed
>
>
> From: Dan Lorenzini <uunet.uu.net!gcm!amadeus!dal at tektronix.TEK.COM>
> To: uunet!eecs.nwu.edu!sun-managers at uunet.uu.net
> Subject: Re: Double Cron
> In-Reply-To: Your message of Tue, 07 Nov 89 14:45:07 -0800.
> Date: Wed, 08 Nov 89 11:42:51 -0500
> Status: OR
>
>
> Re: cron doing things twice:
>
> The way I heard it, it is a hardware problem (the Sparcstation is too
> fast) :-)
>
> Apparently, there was a problem with 4.0 cron executing jobs twice
> (there was (still is?) a problem with calendar also).  Sun patched it,
> but it still has the problem on the Sparcstation-1's that we have
> here.
>
> Sun sent me a workaround -- I haven't used it yet, but here it is in
> case anybody needs it:
>
> ------------------------------------------------------------------------
> 	#!/bin/sh
> 	LOCK=/tmp/.mumble-lock
> 	echo $$ > ${LOCK}
> 	sleep 60
> 	if [ $$ = `cat ${LOCK}` ]; then
> 		# I get to do it
> 		rm ${LOCK}
> 	else
> 		# The other process gets to do it
> 		exit
> 	fi
> 	# Actually do whatever you wanted to do...
> ------------------------------------------------------------------------
>
> Dan Lorenzini
> uunet!gcm!dal
>

I put in the a fixed cron and still get the problem on occasion.

>>> cron:9

X-From:      dmc%cam.sri.com at Warbucks.AI.SRI.com

You probably know this by now, but this is a Sun bug (Sparcstations are
too fast for the software or something). There is a workaround; replace
"mycommand blah blah" by "safe_cron mycommand blah blah" in your crontab,
where safe_cron is the following script:

#!/bin/sh
# Workaround for the bug where cron jobs sometimes get run twice, a
# minute apart, on Sparcstations.
if [ `arch` = sun4 ]
  then
LOCK=/tmp/.`basename $1`.lock
echo $$ > ${LOCK}
sleep 60
if [ "$$" = "`cat ${LOCK}`" ]; then
  # I get to do it
  rm ${LOCK}
else
  # The other process gets to do it
  exit
fi
# Actually do it
$*
else
# not a Sun4, just do it
$*
fi

=======================================================================

Alan Judge, SysAdmin, Dept. of Maths, Trinity College, Dublin, Ireland.
    ajudge at maths.tcd.ie  a.k.a.  amjudge at cs.tcd.ie
also, Distributed System Group, Dept. of Computer Science, TCD.



More information about the Comp.sys.sun mailing list