Testing whether combreloc works.

Testing if the runtime linker caches symbol resolutions.

The presence of the cache in the dynamic link editor on GNU/Linux 386 is easily checked using the following command:

   $ LD_DEBUG=statistics cat
   16551:
   16551:  runtime linker statistics:
   16551:    total startup time in dynamic loader: 875905 clock cycles
   16551:              time needed for relocation: 626575 clock cycles (71.5%)
   16551:                   number of relocations: 260
   16551:        number of relocations from cache: 99
   16551:             time needed to load objects: 146172 clock cycles (16.6%)

The important hint is the line labeled "number of relocations from cache". The presence of this line reveals that you have the symbol resolution cache.

Testing if a library was compiled with combreloc.

The symbol resolution cache is not effective unless the shared libraries have been compiled with option -z combreloc. This is easily checked on GNU/Linux 386 using the following command:

   $ readelf -S /usr/lib/libkdecore.so | grep REL

The presence of section .rel.dyn indicates that this library has been compiled with option combreloc. Otherwise you will see a number of sections named .rel.text, .rel.data, etc.


Preparing packages with objprelink2.

Do you really want to do this?

Program objprelink2 is solely designed for evaluating the benefits of the lazy resolution of virtual table entries. This is not a production tool. It works only on Linux/Intel platforms. The only reason to use it would be to quickly recompile KDE on a Linux box that does not offer the combreloc method. This will improve the responsiveness of KDE a little bit more than using the more tested combreloc method.

Program objprelink2 does not currently work with gcc-3.x.

Once more:  If you have the updated GNU tools,
do not expect obvious improvements in the KDE application startup time!

You have been warned!

Obtain and compile objprelink2?

You must first obtain the source code from the Sourceforge CVS. Get instructions from the Sourceforge objprelink project page.

Use the following commands to compile objprelink2:

  $ cd objprelink-2
  $ configure
  $ make
  $ make install

You must have the BFD library available. Make sure that the BFD include file bfd.h matches the installed version of the BFD library. This will install a command named g++prelink.

Preparing a package with objprelink2.

To prepare a package with objprelink2, you just have to use g++prelink instead of the g++ compiler. It works on my machine and might even works for yours.

The following commands show an easy way to achieve this:

   $ mkdir /tmp/bin
   $ ( cd /tmp/bin; ln -s /usr/local/bin/g++prelink g++ )
   $ export PATH="/tmp/bin:$PATH"
   $ g++ -v
   Objprelink2 version 1.5
   Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
   gcc version 2.96 ...
Then compile your package as usual.

Understanding the gory details of objprelink2.

Program g++prelink usually invokes the g++ compiler with the same arguments. However, when preparing a shared library, it performs the following steps:

  • Call the GNU linker with option -r to incrementally link all object files together.
  • Pre-process the resulting object file by
    • locating all virtual tables entries,
    • creating section containing a ten bytes relay stub for each distinct virtual table symbol,
    • redirecting all virtual table entries to jump instructions located in the stub section.
    • create R_386_PLT32 relocation entries for each jump instruction. These relocation entries will cause the GNU linker to create PLT stubs for all these symbols.
    • create dummy R_386_32 relocation entries that the GNU linker will transform in R_386_RELATIVE relocations that will be used to patch the PLT.
  • Call the GNU linker to produce a shared library.
  • Post-process the shared library by patching all PLT stubs used by virtual table entries in order to make them work regardless of the content of register ebx. The dummy R_386_RELATIVE relocations are used to adjust the required absolute addresses at run time.
This complicated procedure is far from optimal:

Firstly, the virtual table entries should directly point to the PLT stub instead of using a relay stub in the text section. The relay stub is currently used by the post-processor to locate the relevant PLT stubs and also to generate dummy relocations to patch the PLT entries.

Secondly, the GNU linker seems unable to merge read-only constants during the incremental link step. This buglet uselessly increases the size of the data section.

Implementing the same ideas inside the GNU linker would suppress the need for the ten byte relay stubs, and also benefit from merged read-only constants.