How to merge two files in awk??

Thu Jan 31 01:09:44 AEST 1991

In article <3404 at d75.UUCP> @xlab1.uucp () writes:
> Supposing I have two files with three collumns in each. How do
> I merge the files and generate a single file with six or more 
> collumns using shell script?  for example if File A has collumns a, c, e 
> and File B has collumns b, d, f. I want to generate File C
> with collumns a,b,c,d,e,f.  Also it would be nice to be able to
> using the arithematic feature in awk...

IMHO this is not feasable with OLD "awk" for LARGE files.

Small files could be saved in an associative array.

	awk '
	FILENAME == "first" {
		line[NR] = $0
	}
	FILENAME == "second" {
		print line[++i] " " $0
	}
	' first second

Of course, UNIX has enough friendly commands to help you, e.g.:

	pr -tm first second | awk '{ whatever you like }'

With NEW "awk" (nawk) merging is feasable, e.g:

	nawk '{
		printf "%s ", $0
		getline < "second"
		print
	}' first

> Finally, how do u specify the "rest of the line" in awk??

I don't quite understand this. Do you mean the following:

	33.5 ZZZ 4564.334 foo bar
			  ^^^^^^^--- processed as "rest of line"
	^^^^ ^^^ ^^^^^^^^ ---------- processed as $1, $2, $3

In this case there are several solutions: If in your input data the
first three fields always occupy the same space, say 18 chars, you
can access the "rest of line" as substr($0, 19).

If the $1..$3 have no equal witdh, but you are sure that there is
only one separator between them, you may sum them up and get the rest
of the line with substr($0, length($1) + length($2) + length($3) + 3).

In any case my advice would be - if possible - to re-design your
input data, e.g. to put some unique separator before the "rest of
the line, say:

	33.5 ZZZ 4564.334 !foo bar
			  ^------------ unique, i.e. must not appear
as part of $1, $2, $3 or the rest of the line. Then you can use
split($0, xx, "!") and access the rest of the line with xx[2].

My general observation is that "awk" is a real "power tool", but to
get out most of it with not too complicated programs you should obbey
certain design criteria for your input data, e.g. you should use
unique separators in a hierachical way:

	XXX:a,b,c:YYYYYYY ZZZZZ
	                  ^^^^^---- $2
	^^^^^^^^^^^^^^^^^ --------- $1
            ^^^^^ ----------------- split($1, xx, ":")     -> xx[2]
	        ^ ----------------- split(xx[2], yy, ",")  -> yy[3]

Some other small hint: It's trivial to design a "comment feature" for
your input data using the familiar style that every line starting with
a "#" is thrown away. The following is an excerpt which can be found in
many of my "awk"-programs:

	awk '
	/^[ \t]*#/ { next; }
	......
	...... rest of program
	......
	'
-- 
Martin Weitzel, email: martin at mwtech.UUCP, voice: 49-(0)6151-6 56 83