shared libraries can be done right

Thu May 30 18:51:27 AEST 1991

In article <18370001 at hpfcso.FC.HP.COM>
	mjs at hpfcso.FC.HP.COM (Marc Sabatella) writes:
: For those of you who haven't put this discussion into your "kill" files by now:
: Bill, your proposal is in fact quite similar to the Apollo Domain system, and
: what I last heard proposed for OSF/1, except that you propose a finer (page)
: granularity.

There's something to be said for either end of the spectrum. With
a small granularity, you don't have to load in the entire
executable (or the pages with shared references, anyway); you can
just load what gets used, which I think is particularly important
when the thing being used is, for example, a huge library like
the X libraries (are supposed to be, I don't use X because I
dislike hogs) For a large granularity, you get to toss more data
once you've loaded the library, and you have less work in setup
and the like for doing the fixup.

There's obviously a minimum there; I think (totally as a matter
of intuition, this should obviously be tested) that it is at the
smaller granularity.

:               At the heart of both all of these is the concept of
: "pre-loading", where the first time a dynamically linked page is loaded, its
: external references are fixed up.  This assumes, as you explicitly stated, that
: the resolutions will be the same for each program.  Unfortunately this cannot
: be guaranteed.  The "malloc" example brought up by several people in response
: to Alex's claim that shared libraries should be "simple and elgant"
: demonstrates this well.  A library may make calls to malloc(), but different
: programs may provide their own definitions of malloc(), and the library's
: references would have to be resolved differently for each.  Some means must be
: provided for this.

I had this pointed out to me in e-mail; here's what I had to say:

| Suppose you have two shared libraries that define the same
| symbols; perhaps they are different versions of the shared
| library. Some program comes along and runs the first and then runs
| again using the second. The second invocation of the program has
| to consider itself to be not shared with the first invocation; its
| shared text isn't, in this case, shared. Actually, you could
| share those pages of the text which don't make reference to the
| differing shared libraries. Life gets complicated if you do that,
| I think. Still, it might be worthwhile if it can be done
| efficiently, because it would mean that some of the more common
| situations don't cause problems with wasted memory.
|
| This situation, one would hope, doesn't occur often. Either the
| changed version I mentioned above or a literal use of different
| libraries. Another possibility is that a shared library *could*
| refer to variables in the main program. In this case, you lose
| most of the value of shared libraries unless you do the page by
| page thing.
|
| This would get checked for during process startup but is a simple
| enough test, so I don't think it changes anything. You'd have to
| make most of the test anyway, just to open the shared libraries.

Anyway, this seems to solve the problem you mentioned, without
excessive hackery.

:                                                                        However,
: there is a tradeoff here as well.  Since the mapping operation generally
: reserves swap for shared library data segments mapped copy on write, a program
: that uses only a little of a library's static data segment may need more swap
: space to execute than it would if it were linked with archive libraries.  In
: the shared case, swap is reserved for the whole library's data segment, but in
: the archive case, only those few modules needed by the program are copied into
: the a.out, so the data space for the rest of the library needs no swap at run
: time.  We measured up to 100K of "wasted" swap per process for Motif
: applications.

The trade-off, then, is between the allocated space in swap for
each running process, vs. the disk space saved for the
executables? Is there any other way to avoid the swap deadlock I
assume is the reason for allocating for the worst case?

If so, the solution to this would be to use that method and then
allocate swap space to meet expected peak, not worst case; if not,
the question is whether the total of all running processes is
going to be comparable to the total of all commands. I suggest
not. :-)

Another solution to this problem is to use smaller shared
libraries, instead of a monolithic library. At least in my scheme,
this doesn't involve much additional overhead, so it would be the
easy solution.

: As for memory savings, I tend to side with Masataka-san on this - you'll have
: to prove it really does make a difference.  So far, I've seen little other than
: anecdotal evidence.  There was a discussion earlier as to whether most of real
: memory was being used for potential shareable text, or for clearly unshareable
: data, and I wish someone would produce some actual numbers.  My gut feel is
: that the savings from sharing even the X11 libraries' text won't amount to
: much as far as really reducing memory consumption as long as huge amounts of
: data are being horded.

I suppose this depends a lot on your mix of applications. On my
system, the total space is probably 50/50 between data and text,
if you ignore the savings obtained from shared text segments.
Counting shared text, this is probably more like 80/20 in favor
of data. So, in general, it would seem that for me there isn't
that much savings to be had.

However, when my system starts swapping at all, its performance
is significantly worse than when it doesn't swap. Also, there is
a knee in the curve, which shared libraries can help avoid
running into. So, while I wouldn't say that the savings that
shared libraries provide in memory are always significant, there
are definitely circumstances where they are.