Need help ** removing duplicate rows **

Gary Weimer weimer at ssd.kodak.com
Thu Nov 8 07:56:44 AEST 1990


In article <10182 at jpl-devvax.JPL.NASA.GOV> lwall at jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
>In article <1990Oct31.003627.641 at iwarp.intel.com> merlyn at iwarp.intel.com (Randal Schwartz) writes:
>: In article <1990Oct30.234654.23547 at agate.berkeley.edu>, c60b-3ac at web (Eric Thompson) writes:
>: | Sounds like what I need is a way to filter out rows
>: | that are duplicate except in the second column.
>: 
>: A one-liner in Perl:
>: 
>: perl -ne '($a,$b,$c) = split(":",$_,3); print unless $seen{$a,$c}++;'
>: 
>: Fast enough?
>
>Maybe, but he said they were very long files, and that may mean more than
>you'd want to store in an associative array, even with virtual memory.
>Presuming the files are sorted reasonably, you can get away with this:
>
>perl -ne '($this = $_) =~ s/:[^:]*//; print if $this ne $that; $that = $this'
>
>Of course, someone will post a solution using cut and uniq, which will be
>fine if you don't mind losing the second field.  Or swapping the first
>two fields around.  I'll leave the awk and sed solutions to someone else.

Who needs sed?

awk -F: '{cur=$1$3$4$5$6$7$8$9$10$11$12$13$14;if(cur!=prev){prev=cur;print $0}}'
InFile > OutFile

NOTE: split to fit in 80 columns--needs rejoined



More information about the Comp.unix.questions mailing list