cmpall -- find identical (duplicated) files

Randal Schwartz merlyn at iwarp.intel.com
Fri Aug 3 14:34:22 AEST 1990


In article <1990Aug2.212411.3078 at sq.sq.com>, lee at sqarc (Liam R. E. Quin) writes:
| I wrote cmpall some time ago, when I found that I had lots of copies and
| duplicated directory hierarchies.

"That's not a knife... *This* (pulls out his piece) is a knife!"

Here's a little ditty called "findsame".  It first does what you do,
in that it finds files that are the same length.  For all files of the
same length, it then runs "sum" to see who's actually the same, and
then "ls -l"'s the matching pairs combinations.

Perl, of course.

================================================== cut here
#!/local/usr/bin/perl

$| = 1;

@ARGV = ('.') unless $#ARGV >= 0;

open(F,"find @ARGV -type f -print|") || die "Cannot open find ($!)";
while (<F>) {
	chop;
	@stat = stat($_);
	$bysize{$stat[7]} .= "$_\n";
}
close(F);

sub numeric {
	0+$a < 0+$b ? -1 : 1;
}

for $asize (sort numeric keys(bysize)) {
	@files = split(/\n/, $bysize{$asize});
	next if $#files <= 0;
	unless(open(S,"-|")) {
		exec "sum", @files;
	}
	%bysum = ();
	while (<S>) {
		chop;
		@F = split;
		$bysum{$F[0]} .= "$F[2]\n";
	}
	close(S);
	for $asum (sort numeric keys(bysum)) {
		@files = split(/\n/, $bysum{$asum});
		next if $#files <= 0;
		system 'ls','-li', @files;
		print "\n"; # to separate them by blank lines
	}
}
==================================================

It's not the best Perl code (I wrote it a long time ago).  I'd
probably rewrite it quite a bit now. :-)

Just another Perl hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn at iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/



More information about the Alt.sources mailing list