new survey to supplement arbitron. Please run this program.

Joe Buck jbuck at epimass.EPI.COM
Sat May 20 08:08:10 AEST 1989


Brian, your program, if invoked in the way you request, will process
crossposted articles N times, where N is the number of groups present.
Please, let's not waste net resources by conducting a large-scale survey
with a basic error in it.

Rather than do a "find" to locate article names, you can count
crossposted articles only once by reading the history file to obtain
article filenames.  Since this is going to alt.sources, I obviously
need to include a source: here is a perl program that eats a history
file and spits out a sorted list of host pairs, showing the links your
news has travelled through.

------------------------------ cut here ------------------------------
#! /usr/bin/perl

# This perl program scans through all the news on your spool
# (using the history file to find the articles) and prints
# out a sorted list of frequencies that each pair of hosts
# appears in the Path: headers.  That is, it determines how,
# on average, your news gets to you.
#
# If an argument is given, it is the name of a previous output
# of this program.  The figures are read in, and host pairs
# from articles newer than the input file are added in.
# So that this will work, the first line of the output of the
# program is of the form
# Last-ID: <5679 at chinet.UUCP>
# (without the # sign).  It records the last Message-ID in the
# history file; to add new articles, we skip in the history file
# until we find the message-ID that matches "Last-ID".

$skip = 0;
if ($#ARGV >= 0) {
    $ofile = $ARGV[0];
    die "Can't open $ofile!\n" unless open (of, $ofile);
# First line must contain last msgid to use.
    $_ = <of>;
    ($key, $last_id) = split (' ');
    die "Invalid input file format!\n" if ($key ne "Last-ID:");
    $skip = 1;
# Read in the old file.
    while (<of>) {
	($cnt, $pair) = split(' ');
	$pcount{$pair} = $cnt;
    }
}
# Let's go.

die "Can't open history file!\n" unless open (hist, "/usr/lib/news/history");
die "Can't cd to news spool directory!\n" unless chdir ("/usr/spool/news");

$np = $nlocal = 0;
while (<hist>) {
#
# $_ contains a line from the history file.  Parse it.
# Skip it if the article has been cancelled or expired
# If the $skip flag is true, we skip until we have the right msgid
#
    ($id, $date, $time, $file) = split (' ');
    next if ($file eq 'cancelled' || $file eq '');
    if ($skip) {
	if ($id eq $last_id) { $skip = 0; }
	next;
    }
#
# format of field is like comp.sources.unix/2345 .  Get ng and filename.
#
    ($ng, $n) = split (/\//, $file);
    $file =~ tr%.%/%;
#
# The following may be used to skip any local groups.  Here, we
# skip group names beginning with "epi" or "su".  Change to suit taste.
#
    next if $ng =~ /^epi|^su/;
    next unless open (art, $file);	# skip if cannot open file
#
# Article OK.  Get its path.
    while (<art>) {
        ($htype, $hvalue) = split (' ');
	if ($htype eq "Path:") {
# We have the path, in hvalue.
	    $np++;
	    @path = split (/!/, $hvalue);
# Handle locally posted articles.
	    if ($#path < 2) { $nlocal++; last;}
# Create and count pairs.
	    for ($i = 0; $i < $#path - 1; $i++) {
		$pair = $path[$i] . "!" . $path[$i+1];
		$pcount{$pair} += 1;
	    }
	    last;
	}
    }
}
# Make sure print message comes out before sort data.
$| = 1;
print "Last-ID: $id\n";
$| = 0;
# write the data out, sorted.  Open a pipe.
die "Can't exec sort!\n" unless open (sortf, "|sort -nr");

while (($pair, $n) = each (pcount)) {
    printf sortf ("%6d %s\n", $n, $pair);
}
close sortf;
-- 
-- Joe Buck	jbuck at epimass.epi.com, uunet!epimass.epi.com!jbuck



More information about the Alt.sources mailing list