shell compiler?

Mon Apr 14 20:37:08 AEST 1986

In article <96 at cstvax.UUCP> scott at cstvax.UUCP (Scott Larnach) writes:
> ...  Would a program which turned a shell
>script into an equivalent C program (which would handle i/o
>redirections, fork/exec the appropriate commands, etc.) usefully
>improve the speed of my scripts? 

I am fairly sure that if your shell has "test" built in then a simple
translation of shell control constructs into C will gain you nothing
significant on large scripts, since most of the time is spent doing
fork/exec.  If you have a "clever" translator that knows about a few
common "grep", "sed", "awk" and "ls" usages in shell scripts, and puts them
directly into C code without exec-ing the corresponding program, you may
gain quite a lot more; I don't really think such a translator would be a
very good idea, since it would almost inevitably be large, clumsy and
as-hoc (and buggy!).

As a very rough and ready estimate to the relative load of interpreting
control constructs to path-search and fork/exec, I created the following
trivial script, "shtest":

	#!/bin/sh

	ECHO="$1"

	for i in 0 1 2 3 4 5 6 7 8 9 ; do
		for j in 0 1 2 3 4 5 6 7 8 9 ; do
			for k in 0 1 2 3 4 5 6 7 8 9 ; do
				$ECHO $i$j$k
			done
		done
	done

the following script, "nul.sh":

	#!/bin/sh
	exit 0

and the following C program, "nul":

	main() { exit(0); }

and got the following timings (take with the usual pinch of salt,
especially "elapsed" since the system was not single-user):

		user	system	elapsed
shtest :	 10.4	  2.1	 0:18
shtest echo	 57.2	306.0	 7:17
shtest /bin/echo 52.2	240.9	 8:16
shtest nul	 57.9	356.4	11:04
shtest nul.sh	100.8	441.9	15:43

Notes:
-	"." is the LAST directory on my path, so "nul" requires the longest
	path search.
-	"echo" is NOT built in to our version of the shell, which is
	the Bourne shell as distributed with 4.2, without modification.
-	All commands were run by "time sh shtest COMMAND >/dev/null".
-	Timings were done on a VAX 11/750 running 4.2bsd; the kernel
	recognises "#!" at the start of a script as a "magic number" for
	exec(), which saves a few cycles in starting up a shell script.

The conclusions seem clear:
-	The time spent in shell interpretation is negligible compared to
	the time taken to exec other programs.
-	Searching directories to find commands is cheap compared to
	fork/exec, but expensive compared to interpreting shell control
	constructs.
-	For very small scripts, the startup time of the shell compared
	to a C program is a significant overhead.

Hence, if you have a very small script invoked from the inner loop
of another script, it might be worth trying to do something; however,
it may well be better to embed the inner script directly in its
caller than to translate it to C.

Of course, there are many other constructs than "for" and simple
variable substitution, but I doubt that any of them is an order of
magnitude slower than those.  I would also expect exactly the same
conclusions to hold for ANY other shell used for script-writing.
-- 
	Chris Miller, Heriot-Watt University, Edinburgh
	...!ukc!hwcs!chris   chris at hwcs.uucp	chris at cs.hw.ac.uk