Sort bug causes data loss
Griff Smith
ggs at ulysses.att.com
Wed Sep 19 06:20:56 AEST 1990
In article <2675 at crdos1.crd.ge.COM>, davidsen at crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>
> I have discovered what appears to be a serious bug in the sort
> routine used in several SysV variants including Stellar. Since it
> causes silent loss of data I am cross posting a bit more than I usually
> do.
>
[deleted some details to save space, followed by test script...]
>
> sort -nu <<XX >x$$.tmp
> 1: a
> 3: b
> 2: c
> 1: a
> 10: x
> XX
>
> Of course someone may tell me it's supposed to work that way, and that
> the BSD version is broken.
I suspect this may be the case. The system V manual page says this about
the -u option:
-u Unique: suppress all but one in each set of lines hav-
ing equal keys.
This doesn't agree with the code, though. The real behavior matches what
I find in the BSD manual page:
u Suppress all but one in each set of equal lines.
Ignored bytes and bytes outside keys do not participate
in this comparison.
The next clue is from the System V manual page again:
-n An initial numeric string, consisting of optional
blanks, optional minus sign, and zero or more digits
with optional decimal point, is sorted by arithmetic
value. The -n option implies the -b option (see
below). Note that the -b option is only effective when
restricted sort key specifications are in effect.
The tricky point is that a numeric comparison stops as soon as it finds
a non-numeric character. Since your test file has leading blanks, and
you didn't specify a sort key, the numeric comparison stops when it
sees the leading blank in each record; the test file appears to
contain five empty records as seen by the numeric comparison code.
Furthermore, the -u option suppresses the following escape clause in
the manual page:
When there are multiple sort keys, later keys are compared
only after all earlier keys compare equal. Lines that oth-
erwise compare equal are ordered with all bytes significant.
Translation: if a numeric comparison, or a set of keyed comparisons,
shows that two records match, `sort' then compares both records as
simple text to determine whether the records are really identical.
This `tie breaking' test is suppressed if the -u option is enabled.
Since all five of your test lines appear to be identical, the -u
option deletes all but one of them. I think the command you want
to use is
sort -nu +0
This forces a trip through the key finder, which activates the code
that strips leading blanks.
> --
> bill davidsen (davidsen at crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
> VMS is a text-only adventure game. If you win you can use unix.
Flames, counter arguments, cheerfully accepted. I didn't write the
rules, I just work here.
--
Griff Smith AT&T (Bell Laboratories), Murray Hill
Phone: 1-201-582-7736
UUCP: {most AT&T sites}!ulysses!ggs
Internet: ggs at ulysses.att.com
More information about the Comp.bugs.sys5
mailing list