Is write(2) "atomic" ?

Wed Jul 13 04:29:20 AEST 1988

in article <11410005 at eecs.nwu.edu> naim at eecs.nwu.edu (Naim Abdullah) writes:
> Do UNIX semantics guarantee that write(2) calls will be "atomic" ?

in general, no.  it depends on the implementation.
use some synchronization primitives or one byte writes only.
worse than just mixing up data, if two processes are pounding away
at a file, it data may be lost.  (see below)

> Suppose, process A executes write(fd, "123", 3) and process B
> executes write(fd, "456", 3) "concurrently". The file descriptor fd
> is shared between them (the file was creat(2)'ed for writing by the
> common parent of A and B). Does UNIX guarantee that the contents of
> the descriptor will be "123456" or "456123" (depending on which of
> A and B won the race) but never "124536" ? Does it make a difference
> whether the descriptor is a pipe or a terminal or a disk file or a
> tape drive or something else ?

some special file types may implement atomic writes.  notably, berkeley
sockets (at least under SunOS 3.5 and Ultrix 2.0) appear, empirically,
to be fully atomic.  this is probably to support "reliable" protocols
like tcp/ip.  (could someone who knows the tcp/ip protocol spec confirm
whether or not it requires atomic writes?)  as a side effect, pipes
when implemented by socketpair(2) seem atomic.

i don't know about system v streams, but version 9 streams and pipes
are non atomic, and warn about this on their manual pages (along with a
comment that a fast reader and slow writers can simulate atomicity).

there is a shell archive at the end of this article containing two
programs i have used to test the atomicity of writes.  the first,
write.c, creates two processes.  each writes n strings of A or B to
standard output, with the length and number of strings set from the
command line.  no synchronization is attempted.  the original process
writes strings of A, the child writes strings of B.  so, for example,
the output of "write 5 4" could be
	AAAAABBBBBAAAAABBBBBAAAAAAAAAABBBBBBBBBB
(where 5 is the length of the writes, and 4 is the number of strings)

count.c reads the output of write.c and counts the number of each
character (sort of like uniq -c for characters instead of lines).  the
shell script bufs translates the number of characters into number of
buffers, which will be fractional if there was a non-atomic write.
"count | bufs 5" for the previous data gives
       5   1        A
       5   1        B
       5   1        A
       5   1        B
      10   2        A
      10   2        B

where the problem comes in is larger buffers.  using large enough
writes gives fractional numbers in the second column, on an nfs or
nd filesystem.  on a sun, i have not been able to generate a partial
record, i.e. "124536" from the original article, with a local disk.

what i consider a more serious problem occurs much more frequently than
fractional writes.  data gets dropped.  this occurs with both local
and remote file systems.  using a "write 8193 15" to a local (smd) disk
with an 8192 byte filesystem blocksize on a sun (similar results were
seen on a vax) gave
   90123   11       A
    8193   1        B
    8193   1        A
    8193   1        	<<< empty
   16386   2        A
  114702   14       B
examining the file with od showed nul (0) characters in that area.  it
takes fewer writes to get a similar result repeatedly with an nfs or nd
filesystem.  what seems to be happening is that between the time one
process writes its data and when it updates the file pointer, the other
process gets scheduled to run.

to solve this problem, one would need to add locks or semaphores to
file table entries to guarantee exclusive access to the file pointers.
fortunately, the people who are doing (symmetric) multiprocessor unices
have to do this anyway.

paul haahr
princeton!haahr or haahr at princeton.edu

# to unbundle, sh this file
# bundled by haahr on dennis at Tue Jul 12 14:04:59 EDT 1988
# contents of bundle:
#	write.c
#	count.c
#	bufs
echo write.c >&2
sed 's/^-//' > write.c <<'end of write.c'
-#include <stdio.h>
-
-#define atoi(s)	(strtol((s), (char **) 0, 0))
-#define	streq(s, t)	(strcmp((s), (t)) == 0)
-
-extern char *malloc();
-extern long strtol();
-extern int strcmp();
-
-int main(argc, argv)
-	int argc;
-	char *argv[];
-{
-	int pid, wpid, i, c, n, bufsize;
-	char *buf;
-
-	if (argc != 3) {
-		fprintf(stderr, "usage: %s bufsize nwrites\n", argv[0]);
-		exit(1);
-	}
-	bufsize	= atoi(argv[1]);
-	n	= atoi(argv[2]);
-
-	if ((pid = fork()) == -1) {
-		perror("fork");
-		exit(1);
-	}
-
-	if (pid == 0)
-		c = 'B';
-	else
-		c = 'A';
-	if ((buf = malloc(bufsize)) == NULL) {
-		perror("malloc");
-		exit(1);
-	}
-	for (i = 0; i < bufsize; i++)
-		buf[i] = c;
-
-	for (i = 0; i < n; i++)
-		if (write(1, buf, bufsize) == -1) {
-			perror("write");
-			exit(1);
-		}
-
-	if (pid != 0)
-		do
-			if ((wpid = wait((int *) 0)) == -1) {
-				perror("wait");
-				exit(1);
-			}
-		while (wpid != pid);
-
-	return 0;
-}
end of write.c
echo count.c >&2
sed 's/^-//' > count.c <<'end of count.c'
-#include <stdio.h>
-
-main()
-{
-	int c, lastc, n = 0;
-	do
-		if ((c = getchar()) == lastc)
-			n++;
-		else {
-			if (n > 0) {
-				printf("%6d %c\n", n, lastc);
-			}
-			n = 1;
-			lastc = c;
-		}
-	while (c != EOF);
-}
end of count.c
echo bufs >&2
sed 's/^-//' > bufs <<'end of bufs'
-#! /bin/sh
-n=$1
-shift
-awk 'NF > 0 { printf "%8d   %-8.3g %s\n", $1, $1/'$n', $2 }' $*
end of bufs
chmod +x bufs