Preload libraries - version 0.1

Brent Baccala <baccala@freesoft.org>
November, 2001

This code takes advantage of the dynamic C library's ability to "preload" other libraries which can override its base functionality and provide various useful services. Currently, a user-level directory overlay facility and an HTTP file system are (partially) implemented. The code is known to work under RedHat Linux with a patched GLIBC.

Download from: http://www.freesoft.org/software/preload/preload-0.1.tgz

INTRODUCTION

Most modern UNIX systems use dynamic libraries to reduce the memory requirements of programs. Instead of copying standard library functions into every program that needs them, the standard routines are centralized in a shared library, only one copy of which needs be loaded into memory, no matter how many programs are using the functions in the libraries. Dynamically linked programs are more complex than ordinary, statically linked programs, because they are not self-contained. They require copies of any shared libraries they depend upon to be loaded into memory before they can operate. Furthermore, the memory location of the shared library may not be known ahead of time, so the memory addresses of function calls and variable references need to be computed at run time. All this complexity is usually hidden from the user, done behind the scenes by a special program, the shared library loaded (/lib/ld.so), which is run at the beginning of every dynamically linked program. These programs, and the shared libraries themselves, generally use a standard file format, Executable and Linking Format (ELF), documented in http://vyger.freesoft.org/PDFs/elf.pdf

The dynamic loader has a feature, originally introduced by Sun, which allows other libraries to be "preloaded" when the C library is loaded. These preload libraries can override functions provided by the standard C library, allowing bizarre and/or useful redefinitions of things like open(). There are several caveats, though. First, the standard GNU C library uses function names prefixed by "__" for internal functions. For example, the C library function fopen() uses __open() internally. So, if you want to override the system call open(), and catch everything that calls it, you need to override open(), to get all the calls to open from the user program, as well as overriding __open(), to get all the calls from within the C library itself. Furthermore, the GNU standard C library doesn't export the "__" functions, so the library has to be patched to export these functions - which rather defeats the purpose of preload libraries in the first place (not having to patch the standard C library), but that's the way it is.

Once a shared preload library has been compiled, it can be loaded in one of two ways. First, by placing the name of the file in the LD_PRELOAD environment variable. This allows preload libraries to be loaded on a per-shell basis. Second, by placing the name of the file in the /etc/ld.so.preload file, the library can be loaded on a system-wide basis. However, since a user can always link a program statically, this system-wide feature can't be absolutely relied upon. Each preload library can define an _init function to be run when it loads.

The Freesoft collection of preload libraries currently includes several useful preload libs. "cwd.so" is a utility library that implements the concept of a Current Working Directory in user space. It translates all filenames in system calls into absolute filenames, making it easier for the other libraries to work with them. "unions.so" is an overlay library, which allows a directory tree to be overlaid on top of another, allowing source code and object code to be seperated, for example, while presenting the appearance that all the code resides in the same directory. "httpfs.so" implements an HTTP file system, which allows URLs to be treated as normal filenames. "trace.so" is a utility library which allows debugging of other preload libraries.

INSTALLATION

  1. Patch the standard GNU C library so that it exports "__" functions using the supplied glibc-2.1.1-LD_PRELOAD.patch, or something similar. Recompile the C library (perhaps using glibc-2.1.spec) and install it. Yes, this is a big pain, but the patch is very simple (as you can see). Email drepper@redhat.com if you'd like to see this requirement waived in the future.

  2. "make all" to build unions.so, httpfs.so, and cwd.so

  3. using HTTPFS

    To try it out, add it to the environment variable LD_PRELOAD, i.e:

    $ export LD_PRELOAD=/usr/local/lib/cwd.so:/usr/local/lib/httpfs.so

    The exact location of the files is unimportant, but cwd.so must be preloaded before httpfs.so. Now, try something like:

    $ cat http://www.freesoft.org/

  4. using UNIONS

    Preload before libc (and after cwd.so) to perform "union" or "overlay" mounts, where one directory tree appears to become layered on top of another. Can be used to make CDs "writable", to conveniently seperate source and object files in build trees, to allow package management software to "install" files that actually end up in packages, etc.

    Depends on cwd.so module, which implements CWD in user space, and translates all pathnames in system calls into absolute paths.

      
    $ export LD_PRELOAD=/usr/local/lib/cwd.so:/usr/local/lib/unions.so
    $ export UNIONS=overlay@target(options):...
    

    For example, export UNIONS=/tmp/cdrom@/mnt/cdrom(create,copyonwrite) causes all the files in /tmp/cdrom to appear overlayed on /mnt/cdrom. Any attempted changes to /mnt/cdrom will actually occur on /tmp/cdrom.

  5. System-wide usage:

    /etc/ld.so.preload should contain lines such as:

      	/usr/local/lib/cwd.so
    	/usr/local/lib/httpfs.so
      	/usr/local/lib/unions.so
    

    /etc/unions should contain lines like:

            overlay:target(options)
    

TODO

Code walkthrough/cleanup
Figure out how to handle deletes
Decide if I should "rm -rf" this whole directory :-)
Fixup httpfs module and add WebDAV support
Define an API to provide streams-like functionality - affect a single FD
Define an API to figure out what other modules are loaded
  (so unions.so can demand cwd.so)
Create a userspace NFS and/or Coda module
Create a module to auto-expand zip/tar files
Create a module to provide Hurd translator-type functionality

A SAD STORY

Here's an example of the kind of trouble you can get into using this program. I was running bash with "unions" preloaded, and was cd'ed inside my overlay tree. I had the binary code for a program overlayed on top the source code, and most of the directories were replicated in both trees. The directory I was interested in was called "boot", and I had a backup of the source code in that directory in another dir called "boot.bak". I wanted to swap the two directories.

So I typed "mv boot boot.changed" and then "mv boot.bak boot". Well, the first mv moved the directory in the source partition (it moved it into the binary, overlay partition, but that's another, unrelated, story). But I still had the directory called "boot" in the binary parition, which I had forgotten about. So the second mv moved "boot.bak" into the existing directory "boot" and it became a subdirectory.

So, maybe "mv", or more precisely link()/unlink(), should operate on all files/dirs of the given name. Or maybe that would create more problems in another way. I related this story to point out some of the problems you can get into using this code.

My advise? Use it for well-defined things like make's, but do any manual operations like edits and moves in the normal filestructure.