shared libraries can be done right

Mon Jun 3 07:26:05 AEST 1991

In article <18370002 at hpfcso.FC.HP.COM>
	mjs at hpfcso.FC.HP.COM (Marc Sabatella) writes:
: >There's something to be said for either end of the spectrum. With
: >a small granularity, you don't have to load in the entire
: >executable (or the pages with shared references, anyway); you can
: >just load what gets used
:
: This happens trivially anyhow with demand loading.  Using the scheme we
: developed for HP-UX, you can get at least object module granularity on your
: relocations, so "demand" is only for a module at a time.

I think we just said the same thing. :-) In the scheme I proposed,
the "small granularity" is the page; in the one you describe, it
is the object module.

I've actually thought that maybe doing shared libraries with each
module and shared set of globals as a separately pageable entity
would be the right way to go. But with the relatively large pages
that seem common, this would (as you point out) increase the
amount of paging.

Then again, maybe not. It really depends on, for example, whether
the modules in the shared library are linked in an order that
improves locality. In the typical case, a module starts at some
random place in a page and may extend past the page end, even if
it is less than a page long. If the following routine isn't used
(a good chance, with libc, for example), some of its code gets
loaded pointlessly. Whereas, if you start each module on a page
boundary, you minimize the paging for that module, while
potentially increasing the total paging for the library. Another
advantage of doing this is that the compiler may be able to do
things like not splitting loops across page boundaries, which
could also decrease the working set.

I'm still ambivalent about this.

: | Suppose you have two shared libraries that define the same
: | symbols; perhaps they are different versions of the shared
: | library. Some program comes along and runs the first and then runs
: | again using the second. The second invocation of the program has
: | to consider itself to be not shared with the first invocation; its
: | shared text isn't, in this case, shared. Actually, you could
: | share those pages of the text which don't make reference to the
: | differing shared libraries. Life gets complicated if you do that,
: | I think. Still, it might be worthwhile if it can be done
: | efficiently, because it would mean that some of the more common
: | situations don't cause problems with wasted memory.
:
: This is why we normally use jump table.  They really aren't that big a deal.
: Share the whole text, no fixup necessary except to a table in the data segment.

I'll admit to a prejudice that says that reasonable fixed
overheads are better than overheads that increase indefinitely
(e.g, percentage overheads). For this discussion, that means that
I feel that paying at load time is better than paying at run time.
My feeling is that, for "short" programs, the difference makes no
difference, but for "long" programs, the long run cost is less
with my approach.

: | This situation, one would hope, doesn't occur often.
:
: No, but an analagous one does: some systems provide two versions of malloc()
: located in different libraries.

True, in one sense of often. However, by "often" I meant: often
enough that following my suggestion would result in wasting
something on the order of the amount of space that it saved. On my
system, for example, with just two system shared libraries, this
situation would occur only when the user's program overrode
something in the library and that doesn't happen often (just as
often as I link with -lmalloc. :-)

Still, I can envision, for example, a testing environment, where
nearly every program override something in the shared libraries.
We'd want any system to at least not be pathological when
confronted with that circumstance.

This also argues for a small granularity with my scheme, in that
with a large one, the amount that gets unshared by this
circumstance would also be large.

: : In
: : the shared case, swap is reserved for the whole library's data segment, but in
: : the archive case, only those few modules needed by the program are copied into
: : the a.out, so the data space for the rest of the library needs no swap at run
: : time.  We measured up to 100K of "wasted" swap per process for Motif
: : applications.
:
: >The trade-off, then, is between the allocated space in swap for
: >each running process, vs. the disk space saved for the
: >executables? Is there any other way to avoid the swap deadlock I
: >assume is the reason for allocating for the worst case?
:
: Sure - arbitrarily kill off a process.

I meant: besides that. :-)

:                                         But it is so much less painful to not
: let a process start up than to kill it once has started.  Do you really want to
: see your X server die because some trivial "ls" is using all the swap at the
: moment?

No. (I had assumed that no one in his right mind would consider
killing off random processes as acceptable, so I just ignored that
obvious option.)

Anyway, I've demonstrated to my satisfaction that there is no way
to avoid the problem. Suppose you have N tasks, each needing its
full address space to complete, and each of which requires the
other N tasks to have done something after their full allocation
has been done (imagine N sorts connected by pipes). You'll need
space for all those processes, no matter what.

: >Another solution to this problem is to use smaller shared
: >libraries, instead of a monolithic library. At least in my scheme,
: >this doesn't involve much additional overhead, so it would be the
: >easy solution.
:
: I agree with this, but note there is more VM overhead associated with having
: lots of small libraries than their is with a few monolithic ones.

Well, see my comments above. This may or may not be true, for
suitable values of "lots".