perl

Fri Jun 15 00:46:08 AEST 1990

In article <18498 at well.sf.ca.us> gregs at well.sf.ca.us (Greg Strockbine) writes:
>I'm just starting to look at perl. Is there a good reason
>to use it instead of sed, awk, etc.?

That's a good question, the quick answer to which, IMHO, is yes.  I know
this'll probably spark yet another net jihad, but I'm nonetheless going to
try to substantiate that claim.

Most of us have written, or at least seen, shell scripts from hell.  While
often touted as one of UNIX's strengths because they're conglomerations of
small, single-purpose tools, these shell scripts quickly grow complex that
they're cumbersome and hard to understand, modify and maintain.

Because perl is one program rather than a dozen others (sh, awk, sed, tr,
wc, sort, grep, ...), it is usually clearer to express yourself in perl
than in sh and allies, and often more efficient as well.  You don't need
as many pipes, temporary files, or separate processes to do the job.  You
don't need to go shoving your data stream out to tr and back and to sed
and back and to awk and back and to sort back and then back to sed and
back again.  Doing so can often be slow, awkward, and/or confusing.

Anyone who's ever tried to pass command line arguments into a sed script
of moderate complexity or above can attest to the fact that getting the
quoting right is not a pleasant task.  In fact, quoting in general in the
shell is just not a pleasant thing to code or to read.

In a heterogeneous computing environment, the available versions of many
tools varies too much from one system to the next to be utterly reliable.
Does your sh understand functions on all your machines?  What about your
awk?  What about local variables?  It is very difficult to do complex
programming without being able to break a problem up into subproblems of
lesser complexity.  You're forced to resort to using the shell to call
other shell scripts and allow UNIX's power of spawning processes serve as
your subroutine mechanism, which is inefficient at best.  That means your
script will require several separate scripts to run, and getting all these
installed, working, and maintained on all the different machines in your
local configuration is painful.  

Maybe if nawk had been available sooner and for free and for all
architectures, I would use it for more, but it isn't free (yes, there's
gawk, but that's not been out long) and actually isn't powerful enough for
some of the things I need to do.  Perl is free, and its Configure script
has knowledge of how to compile perl for a veritable plethora of different
hardware and software platforms.

Besides being faster, perl is a more powerful tool than sh, sed, or awk.
I realize these are fighting words in some camps, but so be it.  There
exists a substantial niche between shell programming and C programming
that perl conveniently fills.  Tasks of this nature seem to arise
extremely often in the realm of systems administration.  Since a system
administrator almost invariably has far too much to do to devote a week to
coding up every task before him in C, perl is especially useful for him.
Larry Wall, perl's author, has been known to call it "a shell for C
programmers."

In what ways is perl more powerful than the individual tools?  This list
is pretty long, so what follows is not necessarily an exhaustive list.
To begin with, you don't have to worry about arbitrary and annoying
restrictions on string length, input line length, or number of elements in
an array.  These are all virtually unlimited, i.e. limited to your
system's address space and virtual memory size.

Perl's regular expression handling is far and above the best I've ever
seen.  For one thing, you don't have to remember which tool wants which
particular flavor of regular expressions, or lament that fact that one
tool doesn't allow (..|..) constructs or +'s \b's or whatever.   With
perl, it's all the same, and as far as I can tell, a proper superset of
all the others.

Perl has a fully functional symbolic debugger (written, of course, in
perl) that is an indispensable aid in debugging complex programs.  Neither
the shell nor sed/awk/sort/tr/... have such a thing.

Perl has a loop control mechanism that's more powerful even than C's.  You
can do the equivalent of a break or continue (last and next in perl) of
any arbitrary loop, not merely the nearest enclosing one.  You can even do
a kind of continue that doesn't trigger the re-initialization part of a
loop, something you do from time to time want to do.

Perl's data-types and operators are richer than the shells' or awk's,
because you have scalars, numerically-indexed arrays (lists), and
string-indexed (hashed) arrays.  Each of these holds arbitrary data
values, including floating point numbers, for which mathematic built-in
subroutines and power operators are available.

As for operators, to start with, you've got all of C's (except for
addressing operators, which aren't relevant) so unlink you don't have to
remember whether ~ or ^ or ^= or whatever are really there, as you do in
awk.  Furthermore, you've got distinct relational operators for strings
versus numeric operations: == for numeric equality (0x10 == 16) and 'eq'
for string equality ('010' ne '8'), and all the other possibilities as
well.  You've got a range operator, so you can have expressions like
(1..10) or even ('a'..'zzz'.)   You can use it to say things like
    if (/^From/ .. /^$/) { # process mail header
or 
    if (/^$/ .. eof) { # process mail body

There's a string repetition operator, so ('-' x 72) is a row of dashes.

You can operate on entire arrays conveniently, and not just with things like
push and pop and join and split, but also array slices:
    @a = @b[$i..$j];
and built-in mapcar-like abilities for arrays, like
    for (@list) { s/^foo//; }
and
    for $x (@list) { $x *= 3; }
or
    @x = grep(!/^#/, @y);

Speaking of lisp, you can generate strings, perhaps with sprintf(), and
then eval them.  That way you can generate code on the fly.  You can even
do lambda-type functions that return newly-created functions that you can
call later. The scoping of variables is dynamic, fully recursive subroutines
are supported, and you can pass or return any type of data into or out 
of your subroutines.

You have a built-in automatic formatter for generating pretty-printed
forms with automatic pagination and headers and center-justified and
text-filled fields like "%(|fmt)s" if you can imagine what that would
actually be were it legal.

There's a mechanism for writing suid programs that can be made more secure
than even C programs thanks to an elaborate data-tracing mechanism that
understands the "taintedness" of data derived from external sources.  It
won't let you do anything really stupid that you might not have thought of.

You have access to just about any system-related function or system call,
like ioctl's, fcntl, select, pipe and fork, getc, socket and bind and
connect and attach, and indirect syscall() invocation, as well as things
like getpwuid(), gethostbyname(), etc.  You can read in binary data laid
out by a C program or system call using structure-conversion templates.

At the same time you can get at the high-level shell-type operations like
the -r or -w tests on files or `backquote` command interpolation.  You can
do file-globbing with the <*.[ch]> notation or do low-level readdir()s as
suits your fancy.

Dbm files can be accessed using simple array notation.  This is really
nice for dealing with system databases (aliases, news, ...), efficient
access mechanisms over large data-sets, and for keeping persistent data.

Don't be dismayed by the apparent complexity of what I've just discussed.
Perl is actually very easy to learn because so much of it derives from 
existing tools.  It's like interpreter C with sh, sed, awk, and a lot
more built in to it.  

I hope this answers your question.

--tom
--

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist at convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"