
		   Preload libraries - version 0.1

		 Brent Baccala <baccala@freesoft.org>
			    November, 2001


This code takes advantage of the dynamic C library's ability to
"preload" other libraries which can override its base functionality
and provide various useful services.  Currently, a user-level
directory overlay facility and an HTTP file system are (partially)
implemented.  The code is known to work under RedHat Linux with a
patched GLIBC.

			     INTRODUCTION

Most modern UNIX systems use dynamic libraries to reduce the memory
requirements of programs.  Instead of copying standard library
functions into every program that needs them, the standard routines
are centralized in a shared library, only one copy of which needs be
loaded into memory, no matter how many programs are using the
functions in the libraries.  Dynamically linked programs are more
complex than ordinary, statically linked programs, because they are
not self-contained.  They require copies of any shared libraries they
depend upon to be loaded into memory before they can operate.
Furthermore, the memory location of the shared library may not be
known ahead of time, so the memory addresses of function calls and
variable references need to be computed at run time.  All this
complexity is usually hidden from the user, done behind the scenes by
a special program, the shared library loaded (/lib/ld.so), which is
run at the beginning of every dynamically linked program.  These
programs, and the shared libraries themselves, generally use a
standard file format, Executable and Linking Format (ELF), documented
in http://vyger.freesoft.org/PDFs/elf.pdf

The dynamic loader has a feature, originally introduced by Sun, which
allows other libraries to be "preloaded" when the C library is loaded.
These preload libraries can override functions provided by the
standard C library, allowing bizarre and/or useful redefinitions of
things like open().  There are several caveats, though.  First, the
standard GNU C library uses function names prefixed by "__" for
internal functions.  For example, the C library function fopen() uses
__open() internally.  So, if you want to override the system call
open(), and catch everything that calls it, you need to override
open(), to get all the calls to open from the user program, as well as
overriding __open(), to get all the calls from within the C library
itself.  Furthermore, the GNU standard C library doesn't export the
"__" functions, so the library has to be patched to export these
functions - which rather defeats the purpose of preload libraries in
the first place (not having to patch the standard C library), but
that's the way it is.

Once a shared preload library has been compiled, it can be loaded in
one of two ways.  First, by placing the name of the file in the
LD_PRELOAD environment variable.  This allows preload libraries to be
loaded on a per-shell basis.  Second, by placing the name of the file
in the /etc/ld.so.preload file, the library can be loaded on a
system-wide basis.  However, since a user can always link a program
statically, this system-wide feature can't be absolutely relied upon.
Each preload library can define an _init function to be run when
it loads.

			INSTALLATION

1. Patch the standard GNU C library so that it exports "__" functions
using the supplied glibc-2.1.1-LD_PRELOAD.patch, or something similar.
Recompile the C library (perhaps using glibc-2.1.spec) and install it.
Yes, this is a big pain, but the patch is very simple (as you can
see).  Email drepper@redhat.com if you'd like to see this requirement
waived in the future.

2. "make all" to build unions.so, httpfs.so, and cwd.so

3. using HTTPFS

   To try it out, add it to the environment variable LD_PRELOAD, i.e:

   $ export LD_PRELOAD=/usr/local/lib/cwd.so:/usr/local/lib/httpfs.so

   The exact location of the files is unimportant, but cwd.so must be
   preloaded before httpfs.so.  Now, try something like:

   $ cat http://www.freesoft.org/

4. using UNIONS

   Preload before libc (and after cwd.so) to perform "union" or
   "overlay" mounts, where one directory tree appears to become layered
   on top of another.  Can be used to make CDs "writable", to conveniently
   seperate source and object files in build trees, to allow package
   management software to "install" files that actually end up in packages,
   etc.

   Depends on cwd.so module, which implements CWD in user space, and
   translates all pathnames in system calls into absolute paths.
  
   NOTE: cwd.so must be loaded before unions.so
  
   $ export LD_PRELOAD=/usr/local/lib/cwd.so:/usr/local/lib/unions.so
   $ export UNIONS=overlay@target(options):...

   For example, export UNIONS=/tmp/cdrom@/mnt/cdrom(create,copyonwrite)
   causes all the files in /tmp/cdrom to appear overlayed on /mnt/cdrom.
   Any attempted changes to /mnt/cdrom will actually occur on /tmp/cdrom.

5. System-wide usage:
      /etc/ld.so.preload should contain lines such as:
  	/usr/local/lib/cwd.so
	/usr/local/lib/httpfs.so
  	/usr/local/lib/unions.so

      /etc/unions contains lines like:
  	overlay:target(options)

			TODO

Code walkthrough/cleanup
Figure out how to handle deletes
Decide if I should "rm -rf" this whole directory :-)
Fixup httpfs module and add WebDAV support
Define an API to provide streams-like functionality - affect a single FD
Define an API to figure out what other modules are loaded
  (so unions.so can demand cwd.so)
Create a userspace NFS and/or Coda module
Create a module to auto-expand zip/tar files
Create a module to provide Hurd translator-type functionality

			A SAD STORY

Here's an example of the kind of trouble you can get into using
this program.  I was running bash with "unions" preloaded, and
was cd'ed inside my overlay tree.  I had the binary code for a
program overlayed on top the source code, and most of the
directories were replicated in both trees.  The directory
I was interested in was called "boot", and I had a backup of
the source code in that directory in another dir called "boot.bak".
I wanted to swap the two directories.

So I typed "mv boot boot.changed" and then "mv boot.bak boot".  Well,
the first mv moved the directory in the source partition (it moved
it into the binary, overlay partition, but that's another, unrelated,
story).  But I still had the directory called "boot" in the binary
parition, which I had forgotten about.  So the second mv moved
"boot.bak" into the existing directory "boot" and it became a
subdirectory.

So, maybe "mv", or more precisely link()/unlink(), should operate
on all files/dirs of the given name.  Or maybe that would create more
problems in another way.  I related this story to point out some of
the problems you can get into using this code.

My advise?  Use it for well-defined things like make's, but do any
manual operations like edits and moves in the normal filestructure.
