SUMMARY: which is more efficient...

Paul D. Anderson pda at stiatl.UUCP
Fri Mar 2 07:22:34 AEST 1990


Here is the original question I posed:

|Which is more efficient (ie: cheaper and faster) in a program?
|  - to chdir("/path") and then unlink("file") for each file in the ...
|  - to specify the entire path in each call to unlink()?
|
|I know of namei() in the kernal, it has been suggested that doing
|the chdir() first will mean namei() will not have to be invoked as
|often.  But does not unlink() have to chase back up the directory
|structures (re-construct the directory tree) in order to ascertain
|if the correct permissions exist to allow removal of the file?  Therefore,
|would not calling unlink() with the full pathname be more efficient,
|since all information is provided to unlink() about the path?

The responses were thorough and unique, and some even included test
source.  Make sure you see the timing test results near the end of this
summary. My thanks to everyone who sent me the information!  

What follows are the [edited] summaries... 

--
>From emory!ulysses.att.com!ekrell Tue Feb 20 22:15:24 1990
Eduardo Krell, AT&T Bell Laboratories, Murray Hill, NJ

It's cheaper to chdir to that directory first and then unlink the
files using a relative pathname. How much cheaper will depend on
the version of Unix. It makes a bigger difference in old versions
of System V. BSD uses a namei cache and that diminishes the benefits
of doing the chdir(), but it will still run faster.
    
unlink() doesn't need to check the permissions of the directory
hierarchy up to the root; it only cares about the permissions
on the directories you specified in the argument. If you call
unlink() with just a simple relative file name like "foo.c",
it will check the permissions of the current working directory,
for which there's already an inode pointer in the user structure.

-- 
>From gatech!ames!pacbell!ctnews!unix386!dick Wed Feb 21 03:00:30 1990
Dick Hacking

Why not put all the files in their own filesystem and then
do a mkfs when you are finished with them.... that is *MUCH*
faster than trying to unlink anything.  [novell solution!]

--
>From emory!cs.UMD.EDU!chris Wed Feb 21 04:40:22 1990
Chris

chdir() is more efficient.

No, unlink need chase nothing: permission to unlink files is based on
the permissions of the directory in which the files reside.  If that
directory can be written (by user/group/other, whichever matches first),
files in that directory may be unlinked.

--
>From gatech!mailrus!uunet!prcrs!wrwalke Wed Feb 21 05:00:17 1990
bill.

on all systems i work on bsd4.2, 4.2, and ultrix, hp-ux, "file"
and "/full/path/to/file" are both accessed as inode #'s.  the
unlink merely drops the link count to that inode.  if it has no 
external links, the inode is free'd up for later use.  the chdir()
call merely causes another inode lookup and current process info
change.  in other words, the chdir() would be a waste of time.

	[i presume he means it would be a waste of time if I only 
	deleted one file in a directory before moving to the next.]

another note, be very careful in unlinking dirs, you are freeing
an inode of a file that still must remain to physically access
the other files under it.  fsck will fix it, but it can be messy
on the screen and cause a shortage of inodes (in sysV at least),
as the numbers are not free'd.

--
>From stiatl.uucp!bgi Tue Feb 20 16:32:25 1990
Brad

Well... according to _The Design of the UNIX Operating System_ by
Maurice J. Bach, the most efficient is to chdir first.  Reason being
that unlink doesn't have to reconstruct the path back to root checking
permissions.  I was suspicious of this until I tested it myself. 
  [Brad performed the same test as Jon Kamens, see further down.]


--
>From emory!cbnewsi!npl Wed Feb 21 15:30:09 1990
Nick Landsberg, AT&T Bell Laboratories

Just based on intuition and some knowledge of the internals, I would
venture that the "chdir" approach is MUCH faster, especially if "path"
is long.  If "path" is long then every element in the path must be
parsed by the kernel for every unlink call, the appropriate i-node
looked up, the next element found, etc.  With the chdir() approach,
the path lookup is only done once and then only the current directory
searched for every file to be unlinked.

You've stumbled into a "security hole" in the system.  The only check
done about unlinking a file (by the kernel) is if you have write permission
in the current directory, since you are changing the contents of the
current directory.  (This may not be true of all flavors of Unix,
your mileage may vary.)  Thus, "unlink()" does not traverse the directory
backwards to check all directories for write permission, you aren't
changing any of them.  Whether this is "correct" behavior is questionable,
but the other way is inefficient. 

--
>From gatech!uflorida!ki4pv!cdis-1!tanner Thu Feb 22 11:31:11 1990

If you unlink using the entire path each time, then you will have to
run through the entire set of directories (in a kernel routine most
likely called "namei") to find the file, do permission checks, & check
them each time.

If, on the other hand, you chdir() to the offending directory, you
can unlink the files by name without having to scan the path and read
each directory on the way.  However, you have to remember where you
were before (and change back), or relative pathnames will stop working.

-- 
>From gatech!mailrus!uunet!attcan!telly!eci386!jmm Thu Feb 22 11:31:14 1990
John Macdonald

Why not use: system( "rm -rf /path/*" ) or system( "rm /path/*" )?
(The first for recursively removing sub-directories as well.)
Unless there was major foolishness on the part of the implementors
of your Unix system, this should be very close to optimal for large
numbers of files (as long as the number of files doesn't overflow
the command line: in such a case you could use:
system( "find /path -type f -print | xargs rm; rm -rf /path" ) or
system( "find /path -type f -print | sed -e '/^\/path\/.*\//d' | xargs rm" )
although the second does have to traverse the entire tree even though
it is only removing the top level files.

-- 
>From gatech!mit-eddie!pit-manager.MIT.EDU!jik Thu Feb 22 11:31:23 1990
jik at Athena.MIT.EDU, Office: 617-253-8495
Jonathan I. Kamens, MIT Project Athena

[doesn't unlink() have to chase back up the directory tree?]

No, unlink() doesn't have to do anything of the sort.  Directory
permissions in Unix are not based on permissions of higher-up directories
in any direct manner.  In other words, once you've managed to get
into a directory, you can perform operations in that directory paying
attention only to the permissions in that particular directory.  

Example:

    pit-manager% ls -ldg foo foo/bar
    drwxrwxr-x  3 jik      wheel         512 Feb 21 11:59 foo/
    drwxrwxr-x  2 jik      wheel         512 Feb 21 11:59 foo/bar/
    pit-manager% cd foo/bar
    pit-manager% chmod 000 ../../foo
    pit-manager% touch frep
    pit-manager% ls -ldg .
    drwxrwxr-x  2 jik      wheel         512 Feb 21 12:00 ./
    pit-manager% ls -ldg ..
    d---------  3 jik      wheel         512 Feb 21 11:59 ../

So, you see, I was able to perform operations in "bar" even after I
could no longer do so in "foo".

One of the advantages of this is that if you make a directory executable
but not readable, and then put readable+executable directories underneath
it, then people can cd into those directories and read from them
if they know the names, but not if they don't (i.e. they can't do
an "ls" on the parent directory, but they can use "cd parent/child"
if they know the name of the child.  

--
>From gatech!BBN.COM!pineapple.bbn.com!bbn.com!rsalz Fri Feb 23 04:15:20 1990
Rich $alz

Once you chdir to the directory, to remove a file all you need is
write permission in the dir.  The whole path all the way down  only
counts in the chdir. 

Now, it is possible that
	cd /foo/bar/baz ; rm zap
will fail (no "r" perm in baz, I guess) while
	rm /foo/bar/baz/zap
would succeed.

Doesn't happen very often.

--
>From emory!uunet.UU.NET!prcrs!wrwalke Fri Feb 23 19:40:04 1990
William Walker

boy, you did it now, we have a battle going on now as to whether the 
chdir() as a system call eats more than tracing absolute pathnames.

apparently if the dirs are rather empty, less than say 5 or six files 
each, the consensus is that the syscall is not efficient.  if the dir
is very full, say maybe in the 20-200 file range, the overhead would 
make up for the second syscall.  everyone and his brother will now be
writing test code to prove a point.

on most systems only root can remove a non- empty dir, on some systems,
even remove a busy executable.  if you are running as root, make damn
sure you don't blow a non-empty dir. fsck will make you wish you were
sitting by the pool sipping marguaritas. (unless you run it in "SHUT-UP"
mode, -p ??). 

--
>From emory!virtech!cpcahil Fri Feb 23 20:15:08 1990
Conor P. Cahill

Technically, the first method [chdir()/unlink()] will be slightly
more efficient, since namei() (or lookupname()) will only have to
parse the "file" portion  at unlink time.  However,  you will probably
not be able to demonstrate a difference in executable time between
the two because of namei()  caching and/or buffer caching. 

The real difference would come if the path to file was very long, your
system was very heavily loaded, and the directories along the path
were very large (had lots of entries).

So, for most cases, you won't be able to tell the difference.

--
>From emory!uunet.UU.NET!auspex!auspex.com!guy Tue Feb 27 18:10:14 1990
Guy Harris

> Let me clarify my question from the previous posting:
> I know of namei() in the kernal, it has been suggested that doing
> the chdir() first will mean namei() will not have to be invoked as
> often.

No.  It will just mean that "namei()", in those systems that have it,
will do less work looking up the file.  In those systems that don't
have it, e.g. any system with SunOS-style vnodes - including S5R4 - the
equivalent routine will also do less work looking up the file.  In both
cases, the reason is that it takes more work to look up

	foo/bar/bletch/mumble.c

than it does to look up

	mumble.c

because in the first case you have to search "foo" for "bar", and then
"bar" for "bletch", and then "bletch" for "mumble.c", while in the
second case you only have to search the current directory for
"mumble.c".  Directory name caches help here, but they don't completely
eliminate the slowdown.  In addition, it takes more CPU time to 1)
construct the longer path in your program and 2) copy it into the
kernel.

[But does not unlink() have to chase back up the dir tree?...]

No.  The only permission "unlink()" requires to allow removal of the
file - or, more correctly, removal of a directory entry referring to a
file - is write permission on the directory containing that entry. 
"rm", if invoked without the "-f" flag, also checks whether the file is
writable, but that doesn't require any directory searching either.

In other words,

	unlink("foo/bar/bletch/mumble.c")

is less efficient than

	unlink("mumble.c")

>And does the recommended method used vary based on Sys V or BSD 4.2/3  ??

I don't expect so.

--
>From emory!lzga!bogatko Tue Feb 20 22:15:21 1990
GB

My vote is a chdir, followed by unlinks.

My authority is BACH - "Design of the Unix Op. Sys", section 4.4 pages
74 and 75. 

Short summary -- each path resolution requires a namei resolution
from the first path component, if there are three components to the
path, then three namei resolutions are required.  If there is only
one component, then there is only one resolution. 

As far as I could see from the 5.2 source, unlink calls iget on the
full path if it has a full path, and avoids iget if it has ".". 
Admittedly I didn't spend a lot of time looking at the stuff (it produces
headaches) but it seems to infer that chdir is the better way to go.
 
Incidently, Here's how I did it once.  The example suffers from doing
unlinks on complicated paths, but putting in a 'chdir' shouldn't
be too difficult.  

This is System 5.3 specific.

/*
*
*                                 cleanfiles.c
*
*/

/* 		INITIAL CODING DATE
Sun Sep 25 15:45:45 EDT 1988 by George M. Bogatko

		HEADER FILES  
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <time.h>

/*----------------  IMPORTED GLOBAL VARIABLE/FUNCTION DEF'S  ----------------*/
extern char *regcmp();
extern char *regex();

/*----------------  INTERNAL GLOBAL VARIABLE/FUNCTION DEF'S  ----------------*/

static	char	sccs_id[] = "cleanfiles.c 1 1/16/89 15:15:16";

/*----

SYNOPSIS:
	clean_files(directory, pattern, age)
	char *directory;
	char *pattern;
	long age;

DESCRIPTION:
	Clean_files removes all the files in 'directory' whose names
	match the regular expression 'pattern' and whose modification
	times exceed 'age' seconds.

RETURN:
	The function returns the number of files successfully removed.

EXAMPLE:
	to remove temporary files in '/usr/tmp', whose names begin with
	'ConV', and are more than 10 minutes old, call:

		num_removed = clean_files("/usr/tmp", "^ConV", 600);
						       |
						       notice the caret

CAVEATS:
	Accidently removed files cannot be recovered.  Use this
	carefully.

	This function uses 'regcmp' and 'regex', thus the pattern used in
	the match uses the syntax of 'ed', 'ex', and 'vi' NOT 'sh'.

	'regcmp' and 'regex' are located in libPW.a (cc ... -lPW).

 ------- */
clean_files(directory, pattern, age)
char *directory;
char *pattern;
long age;
{
DIR *dir;
struct dirent *dir_ent;
struct stat buf;
char *cmp_pat;
time_t curtime;
int num = 0;
char tempbuf[100];

	if( (dir = opendir(directory)) == (DIR *)NULL )
		return -1;

	curtime = time( (time_t *)NULL );
	cmp_pat = regcmp(pattern, (char *)NULL);
/*
 *
 *
 *
 * YOU COULD PUT A CHDIR HERE, but be careful if you want to
 * go back to where you were!
 *
 *
 *
 */
	while( (dir_ent = readdir(dir)) != NULL )
	{
		if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL )
		{
			sprintf(tempbuf, "%s/%s", directory, dir_ent->d_name);
			stat(tempbuf, &buf);
			if( (curtime - buf.st_mtime) > age )
			{
				if(unlink(tempbuf) != -1)
					num++;
			}
		}
	}
	closedir(dir);
	free(cmp_pat);
	return num;
}

--
>From emory!lzga!bogatko Wed Feb 21 10:40:13 1990
GB

Always one to never believe straight theory, I've done a test.

This program creates 500 dummy files in "/usr/tmp/1/2/3/4/5/6/7/8/9"
and then cleans them out, all from a C program.  

I picked such a long path name under the assumption that the namei()
resolution would slow down the unlink process, and wanted to give
namei() a lulu of a path to chew on. 

The results are in 'edres',  As you can see, CHDIR is the better way
to go. 

This was written and tested on a 3B600, under System 5.3.1.  If you're
not on such a box, that may explain why you don't have 'opendir',
'readdir', or 'closedir'. 

Hope this helps.

***** CUT HERE ***** 


#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
#	makefile
#	cleanmain.c
#	edres
#	testrm
# This archive created: Wed Feb 21 09:36:37 1990
export PATH; PATH=/bin:/usr/bin:$PATH
echo shar: "extracting 'makefile'" '(309 characters)'
if test -f 'makefile'
then
       echo shar: "will not over-write existing file 'makefile'"
else
sed 's/^      X//' << \SHAR_EOF > 'makefile'
      XD=
      X
      XCFLAGS=-I. -O -Ksd $D
      XLIBS=-lPW
      X
      Xall:
      X	make cleanmainC D=-DCHDIR
      X	rm cleanmain.o
      X	make cleanmainNC
      X	make clean
      X
      XcleanmainC:	cleanmain.o
      X	cc -o cleanmainC cleanmain.o $(LIBS)
      X
      XcleanmainNC:	cleanmain.o
      X	cc -o cleanmainNC cleanmain.o $(LIBS)
      X
      Xclean:	
      X	rm cleanmain.o
      X
      Xclobber:
      X	rm cleanmainC
      X	rm cleanmainNC
SHAR_EOF
if test 309 -ne "`wc -c < 'makefile'`"
then
       echo shar: "error transmitting 'makefile'" '(should have been 309 characters)'
fi
fi
echo shar: "extracting 'cleanmain.c'" '(2114 characters)'
if test -f 'cleanmain.c'
then
       echo shar: "will not over-write existing file 'cleanmain.c'"
else
sed 's/^      X//' << \SHAR_EOF > 'cleanmain.c'
      X#include <stdio.h>
      X#include <sys/types.h>
      X#include <sys/times.h>
      X#include <sys/stat.h>
      X#include <dirent.h>
      X#include <time.h>
      X
      X#define DIRECTORY "/usr/tmp/1/2/3/4/5/6/7/8/9"
      X#define TMP	"_RmT_"
      X#define PAT	"^_RmT_"
      X
      Xextern char *regcmp();
      Xextern char *regex();
      X
      Xmain(argc, argv)
      Xint argc;
      Xchar *argv[];
      X{
      Xstruct tms tbuf;
      Xlong btime, etime;
      Xchar *tnam;
      Xint i;
      Xint fd;
      Xint num;
      Xchar *dir_tory;
      X
      X#ifdef CHDIR
      X	fprintf(stderr, "CHDIR METHOD\n");
      X	chdir(DIRECTORY);
      X	dir_tory = ".";
      X#else
      X	fprintf(stderr, "NON-CHDIR METHOD\n");
      X	dir_tory = DIRECTORY;
      X#endif
      X
      X	fprintf(stderr, "creating 500 files\n");
      X
      X	btime = times(&tbuf);
      X	for(i=0; i< 500; i++)
      X	{
      X		tnam = tempnam(dir_tory, TMP);
      X		if( (fd = creat( tnam, 0666 )) == -1 )
      X		{
      X			perror("creat failure");
      X			exit(-1);
      X		}
      X		close(fd);
      X	}
      X	etime = times(&tbuf);
      X
      X	fprintf(stderr, "creating took %ld clicks\n", etime-btime);
      X	fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime);
      X	fprintf(stderr, "systime - %ld\n", tbuf.tms_stime);
      X	fprintf(stderr, "\n\n");
      X
      X	fprintf(stderr, "calling clean_files\n");
      X
      X	btime = times(&tbuf);
      X	num = clean_files(DIRECTORY, PAT);
      X	etime = times(&tbuf);
      X	
      X	fprintf(stderr, "cleaned %d files\n\n",num);
      X	fprintf(stderr, "cleaning took %ld clicks\n", etime-btime);
      X	fprintf(stderr, "usrtime - %ld\n", tbuf.tms_utime);
      X	fprintf(stderr, "systime - %ld\n", tbuf.tms_stime);
      X	fprintf(stderr, "\n\n");
      X
      X	return 0;
      X}
      X
      X
      Xclean_files(directory, pattern)
      Xchar *directory;
      Xchar *pattern;
      X{
      XDIR *dir;
      Xstruct dirent *dir_ent;
      Xchar *cmp_pat;
      Xint num = 0;
      Xchar tempbuf[100];
      Xchar *dir_tory;
      X
      X#ifdef CHDIR
      X	chdir(directory);
      X	dir_tory = ".";
      X#else
      X	dir_tory = directory;
      X#endif
      X
      X	if( (dir = opendir(dir_tory)) == (DIR *)NULL )
      X		return -1;
      X	cmp_pat = regcmp(pattern, (char *)NULL);
      X
      X	while( (dir_ent = readdir(dir)) != NULL )
      X	{
      X		if( regex(cmp_pat, dir_ent->d_name) != (char *)NULL )
      X		{
      X/*
      X * we'll have to use this sprintf in the non-chdir method, so I'll keep
      X * the overhead in the chdir method.
      X */
      X			sprintf(tempbuf, "%s/%s", dir_tory, dir_ent->d_name);
      X			if(unlink(tempbuf) != -1)
      X				num++;
      X		}
      X	}
      X	closedir(dir);
      X	free(cmp_pat);
      X	return num;
      X}
SHAR_EOF
if test 2114 -ne "`wc -c < 'cleanmain.c'`"
then
       echo shar: "error transmitting 'cleanmain.c'" '(should have been 2114 characters)'
fi
fi
echo shar: "extracting 'edres'" '(637 characters)'
if test -f 'edres'
then
       echo shar: "will not over-write existing file 'edres'"
else
sed 's/^      X//' << \SHAR_EOF > 'edres'
      XCHDIR METHOD
      X
      XCREAT	USR	SYS		CLEAN	USR	SYS
      X
      X1510	63	493		3375	88	807
      X1949	49	665		3064	68	952
      X1717	55	638		3069	82	978
      X1813	51	653		3054	73	938
      X1683	62	633		3074	71	896
      X1676	61	634		3057	85	945
      X1681	43	648		3083	59	958
      X1736	54	643		3061	69	936
      X1687	65	626		3090	80	870
      X1682	58	640		3143	66	893
      X
      X
      X******************
      X
      X
      XNON-CHDIR METHOD
      X
      XCREAT	USR	SYS		CLEAN	USR	SYS
      X
      X1774	61	1170		3591	69	1638
      X1764	72	1179		3278	89	1601
      X1742	77	1162		3278	103	1616
      X1738	65	1181		3253	80	1609
      X1785	66	1193		3283	77	1651
      X1730	75	1175		3320	90	1628
      X1722	81	1177		3289	104	1633
      X1756	75	1173		3256	96	1606
      X1750	61	1177		3391	82	1594
      X1833	66	1183		3258	90	1619
SHAR_EOF
if test 637 -ne "`wc -c < 'edres'`"
then
       echo shar: "error transmitting 'edres'" '(should have been 637 characters)'
fi
fi
echo shar: "extracting 'testrm'" '(182 characters)'
if test -f 'testrm'
then
       echo shar: "will not over-write existing file 'testrm'"
else
sed 's/^      X//' << \SHAR_EOF > 'testrm'
      X> results
      Xfor i in 1 2 3 4 5 6 7 8 9 10
      Xdo
      X	echo CHDIR round $i
      X	cleanmainC  2>>results
      Xdone
      X
      Xfor i in 1 2 3 4 5 6 7 8 9 10
      Xdo
      X	echo NON-CHDIR round $i
      X	cleanmainNC  2>>results
      Xdone
SHAR_EOF
if test 182 -ne "`wc -c < 'testrm'`"
then
       echo shar: "error transmitting 'testrm'" '(should have been 182 characters)'
fi
fi
exit 0
#	End of shell archive

-- 
--
Paul Anderson * h:404-565-0761 w:404-841-4000
{mathcs.emory,gatech}.edu!stiatl!pda  ||  pda at SalesTech.Com



More information about the Comp.unix.questions mailing list