Fundamental defect of the concept of shared libraries

Thu May 23 04:25:57 AEST 1991

>>You don't *have* to have PC-relative jumps and data access, although it is
>>convenient.
>
>No, I don't have to, but it is very inconvenient not to do so.

How inconvenient is it?  The main aggravation on, say, a System/3[679]0
for PC-relative jumps within a routine seem to me to be that

	1) you might have to do a BALR N,0 at the beginning of a routine
	   if the calling convention doesn't require that the address of
	   the routine be loaded by the caller (we're not just talking
	   IBM operating systems here; the convention used by some
	   particular UNIX flavor might or might not work that way);

	2) if the routine is larger than 4096 bytes, you might need more
	   than one base register;

but aren't those problems also present even with
*non*-position-independent code?

For PC-relative procedure calls, you're less likely to find the routine
within 4096 bytes - and, if the routine is external, you can't
necessarily know at compile time whether it's within 4096 bytes or not,
so you'd have to generate worst-case code in any case, so again the
problems would also seem to be present with non-position-independent
code.

>>When PC-relative addressing isn't available or usable, you just need
>>register+offset addressing, which most computers have.
>
>I was wrong here, yes, it is possible if we use indirect addressing to
>access global data, but it is slow.

But are references to global data common enough that the performance hit
is unacceptable?  Remember, even if your idea of "unacceptable" is
"greater than 0", not all of us share your idea of "unacceptable"....

>>The only tricky part is arranging for the register to be set
>>whenever an inter-module call or return takes place.
>
>The call overhead is six extra cycles with typical RISCs, whenever an
>inter-object-file (not inter-library) call-return takes place.

Well, a SPARC executes two "sethi" instructions and one "jmp", once the
link has been snapped; according to the cycle counts in the SPARC
Architecture Manual, Version 8, Appendix L, most implementations would
take 4, rather than 6, cycles for that, and the Matsushita MN10501 would
take 3 cycles.

Which *particular* "typical RISC" were you thinking of?

>It is not negligible when we are heavyly doing something like strcmp().

It depends on how long the strings are, and how heavily you're doing
"strcmp()".  Yes, there are cases where there's a large penalty, but
then there are also cases where a typical cache loses big, too.

>You may remember that the speed of Bnews was actually improved by
>in-lining the first part of strcmp(). In-lining of functions in
>shared libraries is, of course, impossible.

Well, in the version of Bnews we have here, that in-lining is done with
a "STRCMP()" macro, that checks the first two characters and, only if
they're not equal, calls "strcmp()".

Our Bnews programs are dynamically linked, and they have that in-lining;
"In-lining of functions in shared libraries" is, of course, *NOT*
"impossible", as demonstrated by that.

Perhaps you want to completely delete the Bnews example, as it doesn't
bolster your case, and change the statement following it to "in-lining
of functions in shared libraries cannot, of course, be done by the
compiler or compile-time linker"?

>>>Even worse, with some architechture, it is impossible to map several virtual
>>>addresses to a physical address. Virtually tagged cache and inverted
>>>page tables are notable examples.
>
>>Well, this kills any kind of shared text architecture, not just shared
>>libraries.
>
>You can always share text as usual UNIX box do, because it only requires
>to map a single virtual address of several different processes to a
>         ^^^^^^                            ^^^^^^^^^^^^^^^^^^^
>physical address.

Not necessarily.

In the Sun virtually-addressed cache, the "virtual address" includes a
context number; while the "virtual address" bits of the different
virtual addresses in the different processes are the same, the context
number bits aren't.

And, in the Sun virtually-addressed cache, the cache can handle aliases
that differ not only in the context number, but in the "virtual address"
bits, so the statement that "it is impossible to map several virtual
addresses to a physical address" with a virtually-tagged cache is, of
course, not true of the Sun cache.

It's not true of all inverted page table machines, either, cf. the RT PC
and RS/6000.