shell script to...

Fri Apr 12 03:10:43 AEST 1991

>From the keyboard of neil at ms.uky.edu (Neil Greene):
:Any sed gurus that would like to explain how to accomplish the following.  I
:have not masterd the art of sed or awk.
:
:I have a file that contains drug names and next to the drug name is the drug
:group.
:
:> Dipyrone		Analgesic
:> Nefopam		Analgesic
:> Thiosalicylic Acid	Analgesic
:> Xylazine		Analgesic
:> Chloramphenicol	Antibiotic 
:
:A need a shell script that will read from another (ascii) data file, find an
:occurance of a DRUG_NAME, write the line to another (ascii) file and append
:the appropriate DRUG_TYPE to the new line.
:
:# line with drug name in it
:xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx 
:
:# rewrite new line to new ascci file
:xxxx 01/02/90 xxxxxx xxxxx xxx x xxxxxxx Dipyrone .... xxxx xxxxx Analgesic

Here's a simple-minded perl script to do this.  It reads from
"drugs.types" to load the table, then reads stdin and writes stdout
according to your spec:

    open (TYPES, "drugs.types") || die "can't open drugs.types: $!";
    while (<TYPES>) {
	split;
	$types{$_[0]} = $_[1];
    }

    while (<>) {
	chop;
	print;
	study; # compile pattern space for speed
	foreach $name (keys %types) {
	    if (/\b$name\b/) {
		print ' ', $types{$name};
		last;
	    }
	}
	print "\n";
    }

No checking is done on the input validity in the TYPES file.  This would
also be a bit slow if you had a big table because of all the re_comp()s that
get called.  A faster, albeit less obvious way to do this would be to use
an eval.  This makes it look like a bunch of constant strings, which when
combined with the "study" statement, does B-M one better, and really
blazes.  Another possible speed optimization would be to make the if's
into a cascading if/elsif block, which would get internalized into one big
switch statement, and perl would jump directly to the right case.

    open (TYPES, "drugs.types") || die "can't open drugs.types: $!";
    while (<TYPES>) {
	split;
	$types{$_[0]} = $_[1];
    } 

    $code = <<EO_CODE;
	while (<>) {
	    chop;
	    print;
    EO_CODE
	for $name (keys %types) {
	    $code .= <<EO_CODE;
	    if (/\\b$name\\b/) {
		print ' ', \$types{"$name"}, "\n";
		next;
	    } 
    EO_CODE
	} 
	$code .= <<EO_CODE;
	    print "\n";
	}
    EO_CODE

    print $code;

    eval $code;
    die $@ if $@;

--tom