I hate compilers.
I'm responsible for the porting of
Condor to many different
flavors and revisions of OS. It is a challenging job in most respects that
Sisyphus would understand--though I do love it since it hones my technical
skills for use in other areas of my life. I spend a lot
of time with different revisions of the GNU Compiler Collection, the
system programming APIs to a lot of OSes--especially Linux, and know a
fair amount of how vendor compilers and C preprocessors do their job.
The one pervading lesson that I have learned is that people who write
compilers probably don't use them.
For example, good old GNU g++ likes to put -lstdc++ (among other things)
at the end of the compile line like this (on a Redhat 7.2 x86 box while
compiling "Hello World"):
Linux rh7.2 > g++ -v hello.C -o hello
[ snip tangential garbage ]
/usr/lib/gcc-lib/i386-redhat-linux/2.96/collect2 -m elf_i386 \
-dynamic-linker /lib/ld-linux.so.2 \
-o foo \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crti.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbegin.o \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96 \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../.. \
/tmp/ccnZj0aB.o \
-lstdc++ -lm -lgcc -lc -lgcc \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtend.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crtn.o
One might think this is exactly what you want since you shouldn't have to
figure out and supply that crap at the end of the line for symbol resolution.
And you'd be right, you shouldn't have to figure it out.
But, here we teeter on the brink of idiocy. This is pretty much a
catastrophic failure when you deal with binary compatibility between,
as if it matters, Linux distributions (which I'll assume
for the rest of this post). First off, my stupid little hello.C program
requires both the gcc and C++ runtimes (in the rh72 example, there is
no shared gcc runtime, but it will show up later in gcc's evolution) as
shared libraries which tie the executable to specific versions
of the compiler revision's libraries. See:
Linux rh7.2 > ldd ./hello
libstdc++-libc6.2-2.so.3 => /usr/lib/libstdc++-libc6.2-2.so.3 (0x40033000)
libm.so.6 => /lib/i686/libm.so.6 (0x40076000)
libc.so.6 => /lib/i686/libc.so.6 (0x40099000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
The libraries as they stand implement all sorts of goo (to make things
like dynamic_cast function) and for the most part have completely opaque
implementations to the end user. However, those implementations have
functions in them, and functions are the domain of the linker loader
during executable runtime. This is where the problem begins to show
itself.
It turns out, that if you take the above program and run it on a different
Linux distribution, suppose SuSE 8.1, it'll work just fine.
Or does it?
If my little C++ program uses a tiny subset of C++, say no exceptions,
run time type information, or STL, it'll probably work just fine. However,
suppose I make my C++ program a little more complicated by adding in
a correct use of dynamic_cast and recompile it on the rh7.2 box. What
happens when I move it to the SuSE box?
Linux SuSE 8.1 > ./hello
./hello: relocation error: ./hello: undefined symbol: __dynamic_cast_2
Uh Oh! What happened? What happened was that the opaque runtime layer
blew up because the dynamic linker loader couldn't figure out how
to resolve this internal function
at runtime which changed
between stdc++ internal runtime revisions between the stdc++ library
it was linked against and the library it found during execution on the
different machine. That's right, my program could have been happily
running for days until it decided to do a dynamic_cast and BAM it gets
shot right between the eyes. This implies that maybe the rest of the
program might be subtlely producing incorrect information, or not, it
is undefined. However, I only noticed this after adding a slightly more
complex feature of C++ which turned on a mishmash of internal behavior.
So, how do we fix this to achieve binary compatibility? Three options:
1) Remove the dynamic_cast, 2) produce a statically linked executable,
or 3) statically link in only the gcc and c++ runtime libraries while
leaving everything else dynamically linked, and 4) recompile. I definitely
know option 2 is stupid since you can kiss goodbye NSS lookups beyond
'files', option 1 is appealing to me, but due to some strange twist of
fate it isn't chosen, option 4 is out of the question since not only
would that mean I'd have to port 400,000 lines of often deeply magical
code to a new compiler, but also the 9+ million lines of external third
party libraries(like kerberos)--to 28+ different architectures. Option
3 becomes the winner, mostly through forfeit of the other options.
So, let's try the obvious:
Linux rh7.2 > g++ -v hello.C -o hello -Wl,-Bstatic -lstdc++
[ snip extraneous junk ]
/usr/lib/gcc-lib/i386-redhat-linux/2.96/collect2 -m elf_i386 \
-dynamic-linker /lib/ld-linux.so.2 \
-o hello \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crti.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbegin.o \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96 \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../.. \
/tmp/ccUrafey.o \
-Bstatic -lstdc++ -lstdc++ -lm -lgcc -lc -lgcc \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtend.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crtn.o
Linux rh7.2 > ldd ./hello
not a dynamic executable
Oops. What happened? Well, if you look carefully, the stdc++ I added
after the -Wl,-Bstatic is present,
but then so are the compiler
supplied libraries after it. Since -Wl,Bstatic is a stateful flag,
it turns of dynamic linking for everything after it, so not only do I
get my requested static linkage of stdc++, I also get unrequested static
linkage of libc and libm. Kiss NSS good bye.
Ok, what if I get smart and turn back on dynamic linking at the very end
of the link line? I would do this with the fool notion in my head that
since I'm resolving all dependancies in libstdc++ statically with the object
files beforehand, the compiler wouldn't bring in the dynamic version of the
libstdc++ since it wouldn't be needed. Let's see what happens:
Linux rh7.2 > g++ -v hello.C -o hello -Wl,-Bstatic -lstdc++ -Wl,-Bdynamic
[ snip extraneous junk ]
/usr/lib/gcc-lib/i386-redhat-linux/2.96/collect2 -m elf_i386 \
-dynamic-linker /lib/ld-linux.so.2 \
-o hello \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crti.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbegin.o \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96 \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../.. \
/tmp/ccUrafey.o \
-Bstatic -lstdc++ -Bdynamic -lstdc++ -lm -lgcc -lc -lgcc \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtend.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crtn.o
Linux rh7.2 > ldd ./hello
libstdc++-libc6.2-2.so.3 => /usr/lib/libstdc++-libc6.2-2.so.3 (0x40033000)
libm.so.6 => /lib/i686/libm.so.6 (0x40076000)
libc.so.6 => /lib/i686/libc.so.6 (0x40099000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
Um. WTF! In a way, this is totally unexpected and
now I have no idea what is actually in my executable. Do I have two
competing version of libstdc++? How do they interact while running in a
binary compatible situation (two different versions of libstdc++ playing in
the same process)? This is a catastrophe. This is about 50%
of the insidious idiocy about this topic of which I speak.
Ok, I figured out this is terrible so I figure I need to turn off bringing
in of the compiler defined libraries. I find an option: -lnostdlib. Jeez.
I hope you didn't need crt1.o or anything like that since not only does
this option get rid of the appending libstdc++ and friends, it gets
rid of everything else supplied by the compiler as well. In
short there is absolutely no method of turning off the stdc++
and gcc runtime inclusion but still keeping enough low level objects
(like crtn.o) there to produce an executable.
This leaves two options: 1) Only use gcc to link, or 2)
write our own ld script which does the right thing.
Option 1 is laughable from a user's point of view. "You mean to tell me
I cannot use g++ to link my objects when I not only compiled all of my
software with it, but all of the documentation I have says to do it that
way? How do I know I'm supplying the right libraries? Which libraries
do I use for which revision of the compiler?"
Option 2 is laughable from a system programmer's point of view. "You mean
I have to dig around in 28+ different architecture's compiler revision's
interactions with the (potentially vendor) linker with an eye to the
C++ features being currently used in a codebase constantly modified by
40 people and ensure I get the options correct? Oh, and it has to be
maintainable by someone that isn't me and nonfragile in our build system?"
That damned of you do, and damned if you don't is the other 50% of the
idiocy. There is no good solution.
It gets even better. Since it was obvious to me that the stdc++ library
tried to resolve that __dynamic_cast_2 symbol at runtime, if I
manage to link the stdc++ statically through manually specifying the ld
link line, what happens when it hits it at runtime? Let's try it:
Call the linker by hand fixing up the static linking of the stdc++ library
but leaving dynamic libc and libm:
/usr/lib/gcc-lib/i386-redhat-linux/2.96/collect2 -m elf_i386 \
-dynamic-linker /lib/ld-linux.so.2 \
-o hello \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crt1.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crti.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtbegin.o \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96 \
-L/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../.. \
hello.o \
-Bstatic -lstdc++ -Bdynamic -lm -lgcc -lc -lgcc \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/crtend.o \
/usr/lib/gcc-lib/i386-redhat-linux/2.96/../../../crtn.o
Linux rh7.2 > ldd ./hello
libm.so.6 => /lib/i686/libm.so.6 (0x40033000)
libc.so.6 => /lib/i686/libc.so.6 (0x40056000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
That looks promising. The dynamic cast appears to function on the
machine it was compiled in when linked in this fashion, which is a tad
bit surprising. Let's see what happens when we move it to the SuSE 8.1
machine:
Hmm, it worked on the SuSE 8.1 machine. That's definitely
surprising. It sure beats the hell out of me why without serious time
investment.
Here is my line of reasoning which makes me not understand why it works:
If the dynamic linker was wanting to load the __dynamic_cast_2 function
at runtime before, it implies that the function wasn't there in
the original link pass to create the executable and so therefore wouldn't
be brought into the executable at all--which is why the dynamic linker
loader was trying to find it at runtime. I was pretty sure the link pass
to create the executable would not bring in the required object files
and the program would segfault since there wasn't a fancy linker loader
telling it something was wrong. So, why didn't it segfault?
Obviously the problem that started this whole thing was that the C++
ABIs changed radically a few times between gcc 2.96, found on the redhat
7.2 machine, and gcc 3.2.2, found on the SuSE 8.1 machine. Evolution of
the compiler yadda yadda yadda. However the thing that pisses me off is
that the runtime of language, and other compiler internals,
are shared libraries at all. Sure, from the point of view of sharing
text when running multiple programs it makes sense, but from a binary
compatibility point of view it is a disaster. Why isn't it made easier
to package together the run time statically into the binary? Why would
I have to hand invoke the linker to do something that any reasonable person
whould have desired from the beginning?
This is the example of the insidious idiocy. More and more time is being spent
to understand how to do something that should be simple or shouldn't have to
be done at all.
I'm sure I'll get an itch and figure out the exact mechanism for why
it ended up working (I already started poking it) and do another post
explaining it in the future. But for now, the post traumatic stress
disorder episode has passed and I am resting comfortably. The booze
helps. It helps a lot.
I hate compilers.
End of Line.