The Road to Hell Is Paved with Good Intentions
Before we are sucked screaming into the cesspool of reality concerning the implementations of shared libraries, here is a good cross section of why shared libraries are very desirable and often implemented in a multi-tasking operating system:
- Allow shared code between process for smaller memory footprints of collections of programs in memory
- Increase performance of the paging system
- Allow the OS to upgrade system libraries (for bug fixes and improvements)
- Allow programs to have multiple implementations of an unchanging API alterable at runtime
- Allow a program to decide what functionality it should have loaded in memory at any given time because either it is not known until runtime what said functionality must be, or if all functionality is loaded simultaneously, it would be larger than the available physical ram.
So, with all of these benefits, why do I stare longingly at my recently emptied bottle of Mezcal and contemplate a career change whenever I think about this topic? Because I know the truth.
Linux is by far the worst when it comes to the implementation of shared libraries (especially for number 3), but there are a few trouble areas that seem endemic to the use of shared libraries that affect any program that uses them under many OSes. I suspect it is because most people who develop these systems simply say, "Damn, that looks hard, I'll finish it later", which obviously means never.
Simply thinking about the legion of problems I've found with shared libraries across different OSes invokes severe cataplexy. This leaves me unable to muster the energy it would take to write the encyclopedic volumes of animosity and pure contempt for the monumental idiocy that surrounds this topic. Instead, I will only speak of a few encounters with the most foul and ruinous aberrations.
-
What is in my executable at linktime?
Suppose we have these codes:
Linux black > cat foo-v1.c #include < stdio.h > #include < stdlib.h > static char *version = "Version 1"; void foo(void) { printf("foo()'s version is %s\n", version); } void bar(void) { printf("bar()'s version is %s\n", version); }
Linux black > cat foo-v2.c #include < stdio.h > #include < stdlib.h > static char *version = "Version 2"; void foo(void) { printf("foo()'s version is %s\n", version); } void bar(void) { printf("bar()'s version is %s\n", version); }
Linux black > cat bar.c #include < stdio.h > #include < stdlib.h > #include < dlfcn.h > extern void foo(void); extern void bar(void); int main(void) { char *name = "bar"; void *handle; void (*func)(void); void *error; foo(); handle = dlopen("./libfoo-v2.so", RTLD_NOW); if (handle == NULL) { printf("Can't dlopen\n"); exit(EXIT_FAILURE); } func = dlsym(handle, name); if ((error = dlerror()) != NULL) { printf("Can't dlsym: %s\n", (char*)error); exit(EXIT_FAILURE); } dlclose(handle); (*func)(); bar(); foo(); return 0; }
Linux black > cat Makefile bar: bar.o libfoo-v1.a libfoo-v2.so gcc -g -Wall -rdynamic -Wl,-rpath . bar.o -lfoo-v1 -L. -lfoo-v2 -o bar -ldl bar.o: bar.c gcc -g -Wall -c bar.c -o bar.o libfoo-v1.a: foo-v1.o ar qs libfoo-v1.a foo-v1.o foo-v1.o: foo-v1.c gcc -g -Wall -c foo-v1.c -o foo-v1.o libfoo-v2.so: foo-v2.c gcc -g -Wall -shared -fPIC foo-v2.c -o libfoo-v2.so clean: rm -f bar bar.o libfoo-v1.a foo-v1.o libfoo-v2.so
Now, let's make the executable.
Linux black > make gcc -g -Wall -c bar.c -o bar.o gcc -g -Wall -c foo-v1.c -o foo-v1.o ar qs libfoo-v1.a foo-v1.o ar: creating libfoo-v1.a gcc -g -Wall -shared -fPIC foo-v2.c -o libfoo-v2.so gcc -g -Wall -rdynamic -Wl,-rpath . bar.o -lfoo-v1 -L. -lfoo-v2 -o bar -ldl Linux black > ldd ./bar libfoo-v2.so => ./libfoo-v2.so (0xb7ffe000) libdl.so.2 => /lib/libdl.so.2 (0x00bda000) libc.so.6 => /lib/tls/libc.so.6 (0x00a89000) /lib/ld-linux.so.2 (0x00a6f000)
Notice how I bring in the static version of version 1 of foo, and the dynamic version of version 2 of foo. There are no errors while linking. This means that the symbols needed by bar.c were fully resolved by the static foo-v1.a archive. The linker simply adds as a dynamic reference the libfoo-v2.so shared library. However, there is trouble brewing. Let's run the executable and see what we get.
Linux black > ./bar foo()'s version is Version 1 bar()'s version is Version 2 bar()'s version is Version 1 foo()'s version is Version 1
Bet you didn't expect that did you? What happened was this: The dlopen() opened up the library and found bar() in it. However, the static qualifier for the "version" variable made it necessary to allocate a new bss segment associated with the library and run the initializers on it since technically a static qualifier says a variable can only be accessed by that file. Since the static version and dynamic versions were two different files, they get two different memory locations for the "version" variable. Normally, this is actually what you want. However in this case, it is extremely confusing since bar() appears to have two behaviors and depending who calls it when and why. Under the right, and not too uncommon conditions, this will screw things up badly. Also notice how the -rdynamic didn't really help us since it couldn't override the use of static in this context. In addition, if you think RTLD_GLOBAL is going to help you here when paired with -rdynamic, I have some money sitting in a Nigerian bank I'd like you to help me transfer.
Of course, if you remove the static qualifier, then everything seems to work, because the global variable being accessed is found from the executable text when the dlsym on the bar resolves the variable addresses of the version 2 library call. However, there is another pit trap here. If the code inside of version 2 bar() differs from version 1 bar(), then it could potentially corrupt the hell out of the global variable since it might have been expecting different values in it.
Either way, you're screwed. Such is the idiocy. The above screws up on both Solaris and Linux. And, it is surprisingly easy to get into this conundrum in very complicated applications.
-
What is in my executable at runtime?
Ok, so the above is pretty scary, but since when do you screw around like that with dlopen()? At least the above seems like a dangerous thing to do and maybe best avoided. Isn't there a more insidious way to get the same result without actively trying?
You bet there is...
For some reason that escapes me, it is possible for a shared library to have automatic loading requirements for other shared libraries it depends upon. Here is an example from Linux fedora core 3:
Linux black > ldd /lib/libnss_wins.so.2: libldap-2.2.so.7 => /usr/lib/libldap-2.2.so.7 (0xb7ef3000) liblber-2.2.so.7 => /usr/lib/liblber-2.2.so.7 (0xb7ee6000) libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0xb7ed2000) libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0xb7e6d000) libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0xb7e4c000) libcom_err.so.2 => /lib/libcom_err.so.2 (0xb7e49000) libresolv.so.2 => /lib/libresolv.so.2 (0xb7e35000) libc.so.6 => /lib/tls/libc.so.6 (0xb7d0a000) libsasl2.so.2 => /usr/lib/libsasl2.so.2 (0xb7cf6000) libssl.so.4 => /lib/libssl.so.4 (0xb7cc2000) libcrypto.so.4 => /lib/libcrypto.so.4 (0xb7bda000) /lib/ld-linux.so.2 (0x00a6f000) libdl.so.2 => /lib/libdl.so.2 (0xb7bd6000) libcrypt.so.1 => /lib/libcrypt.so.1 (0xb7ba7000) libz.so.1 => /usr/lib/libz.so.1 (0xb7b97000)
Now, this particular library is deep inside of the NSS system which will be manually dlopen()ed by libc if the sysadmin configures nsswitch.conf in the proper manner. You as an application writer/user have absolutely no control over the fact that the above libraries are going to be brought into your executable's space.
So, what happens when you've decided to use SSL in your executable by statically linking it in (a very common scenario) and the administrator has decided to configure the machine to use the wins resolution protocol? Um. Yeah. Good luck with that. I hope your shrink is on speed-dial.
In addition, dynamically linking SSL into the executable might not fix a damned thing simply because you might link in a different revision than what is being used by the wins protocol. In this case, you'll actually have TWO dynamic libraries screwing around in each other's (depending upon how they were loaded) global address spaces. And, even IF you link in the same version, through twisted uses of static global variables and non static global variables, you'll probably end up corrupting the state of the originally set up SSL library. I've actually seen this problem in the wild.
This is reality. Get used to it.
-
Don't break APIs and behavior across revisions of dynamic libraries.
Yes, this means YOU Linux. This is one of the single greatest causes of portability failures. Just Stop Doing It.
Oops! The crisis prevention center just picked up my call. I gotta go.
End of Line.