unbatcher out of sync - another cure

Eamonn McManus em at dce.ie
Tue Mar 26 04:44:34 AEST 1991


rob at mtdiablo.Concord.CA.US (Rob Bernardo) writes:
>I've also had some difficulty lately with 'out of sync' unbatching
>problems. Unfortunately, Eamonn McManus's patchbatch didn't work.
>Below is a shell archive for a more robust program to fix batches
>with bad article character counts.

There are advantages and disadvantages to each of our programs.  Patchbatch
is designed to be run automatically on all incoming batches, whereas Rob's
program (rebatch) is to be run by hand on known bad batches.  Running
automatically from newsrun means that the fixer doesn't have to worry
about decompression and the like.

The reason I wrote patchbatch to fish around in the vicinity of the
supposed article end, rather than scanning through every line as rebatch
does, was that it provides a greater degree of transparency.  If an
article happens to contain the string "#! rnews" at the beginning of a
line, rebatch will assume it ends there.  Patchbatch is only susceptible
to problems if an article contains such a string very near the end.  Also,
if an article is truncated in the middle of a line, so that the "#! rnews"
of the following article is not preceded by a newline, rebatch will not
find that article.  Of course if it were changed to look for "#! rnews"
anywhere in a line it would go ape on articles like this one.

There is a problem with hacks like these, of striking a balance between
fixing corrupt batches and leaving alone correct ones.  Patchbatch stays
closer to the latter at the expense of sometimes failing to do the former.
However, I think people should try increasing the value of FUDGE before
resorting to a more promiscuous program like rebatch.  You might also need
to change the size of the buf[] array when doing this; I can't remember if
the version I posted had a magic constant 64 as the size (ugh).

Another noteworthy difference between the programs is that patchbatch
modifies the batch in place rather than creating a replacement.  This
means that it is much faster.  In particular, if you only occasionally get
corrupt batches you can afford to run patchbatch over every incoming
batch, since there is very little overhead in checking through a correct
batch.  There is a theoretical problem, in that the size of an article may
change from an n-digit number to a (n+1)-digit number, in which case
patchbatch will fail.  I never saw this happen in practice.

,
Eamonn



More information about the Alt.sources.d mailing list