1Gbyte file on a 130Mb drive (fsdb)

Wed Jun 26 05:52:46 AEST 1991

In article <124 at comix.UUCP> jeffl at comix.Santa-Cruz.CA.US (Jeff Liebermann) writes:
>How does one deal with a bogus 1Gigabyte file?
>I have a Xenix 2.3.3 system that has ls magically
>declare a 45Mb accounting file as 1Gbyte huge.
>
>ls	declares it to be 1Gb big.
>du	agrees.
>df -v	gives the correct filesystem size.
>fsck	"Possible wrong file size I=140" (no other errors).
>
>To add to the problem, I'm having difficulty doing
>a backup before attacking.

It may not be anything "wrong" with the inode. There may just
be a gap where nothing was written. For example, if you do
an lseek of 10,240 and write 1K the utilities above will
show the file size to be 11K when in reality you have used
only one block. dbm did this at one time (don't know about
now).

The real problem lies in that no utility recognizes these
intervening empty blocks as not actually there. After all,
if you read back the above file, you will get 10K of nulls
and then your data. So if you use cp, tar, cpio, etc. they
"fill in the blanks" and make your file actually be the
logical size by writing nulls in all the empty blocks.

There are two ways of dealing with this problem:

1) Most databases have a utility which dumps records and loads
   them. Use this to back up the data. To restore the data you
   make empty data files and use the load utility. This is the
   preferred method if available. It also has the side benefit
   of (usually) making the new files run faster.

2) This is ugly but effective. It only works assuming the "sparse"
   file scenario. Use the attached program to compress it.

   To read it back, read in each struct, lseek to the offset
   in lseekval and then write the block (actually n bytes) to the 
   output file. 

-------------------------- sparsecp.c ---------------------------
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

#define BSIZE 1024

struct orec {
	long lseekval;
	int cnt;
	unsigned char buf[BSIZE];
} rec;

main(argc, argv)
int argc;
char *argv[];
{
	int infd, outfd, n;
	long offset;

	/* error checking is left as an excersize for the reader */
	infd=open(argv[1], O_RDONLY);
	outfd=creat(argv[2], 0700);
	offset=0; 
	while ((n=read(infd, rec.buf, BSIZE)) > 0) {
		if (notempty(rec.buf)) {
			rec.lseekval=offset;
			rec.cnt=n;
			write(outfd, &rec, sizeof (struct orec));
			nullout(rec.buf);
			if (n != BSIZE)	/* partial block at EOF */
				break;
		}
		offset+=BSIZE;
	}
	close(infd);
	close(outfd);
}

notempty(s)
unsigned char s[];
{
	register i;
	for (i=0; i<BSIZE; ++i)
		if (s[i])
			return(1);
	return(0);
}

nullout(s)
char s[];
{
	register i;
	for (i=0; i<BSIZE; ++i)
		s[i]='\0';
}
---------------------------------- end ------------------------------

-- 
Roger Knopf                             "Oh my...."
SCO Consulting Services	                    -- Our Pal, Marty Stevens
uunet!sco!rogerk or rogerk at sco.com     408-425-7222 (voice) 408-458-4227 (fax)