Need help ** removing duplicate rows **

Randal Schwartz merlyn at iwarp.intel.com
Wed Oct 31 11:36:27 AEST 1990


In article <1990Oct30.234654.23547 at agate.berkeley.edu>, c60b-3ac at web (Eric Thompson) writes:
| I have a few very long files that contain rows of ASCII data.  Each row
| looks something like this (not the actual data here):
| 
| a:A:b:c:d:e:f:g:h:i:j:k:l:m
| a:B:b:c:d:e:f:g:h:i:j:k:l:m
| a:C:b:c:d:e:f:g:h:i:j:k:l:m
| a:D:b:c:d:e:f:g:h:i:j:k:l:m
| b:A:n:o:p:q:s:t:u:v:w:x:y:z
| c:A:x:a:x:b:x:c:d:a:m:l:v:x
| d:A:m:l:k:j:i:h:g:f:e:d:c:b
| d:B:m:l:k:j:i:h:g:f:e:d:c:b
| d:C:m:l:k:j:i:h:g:f:e:d:c:b
| 
| It's the second column that's important.  If there are multiple rows that
| are exactly the same except for the second column, I want to GET RID of them.
| If the row is unique (for example, the ones starting with "b" and "c" above)
| then it should stay.  Sounds like what I need is a way to filter out rows
| that are duplicate except in the second column.

A one-liner in Perl:

perl -ne '($a,$b,$c) = split(":",$_,3); print unless $seen{$a,$c}++;'

Fast enough?

print "Just another Perl hacker,"
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn at iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Intel put the 'backward' in 'backward compatible'..."=========/



More information about the Comp.unix.questions mailing list