Archive for the ‘Computer Networking’ Category

Cloud Computing

Friday, January 3rd, 2014

Cloud computing has become one of the biggest buzzwords in IT. What is it? How does it work? Is it for real?

Cloud computing is an old idea in a new guise. Back in the 70s, users bought time on mainframe computers to run their software. There were no PCs. There was no Internet. You punched your FORTRAN program on cards, wrapped a rubber band around them, and walked them over to the computer center.

Then came the PC revolution, followed hard on its heels by the Internet. Everyone could now have a computer sitting on his desk (or her lap). Sneaker net gave way to Ethernet and then fiber optics. Mainframes became passé.

Well, mainframes are back! Turns out that centralized computing resources still maintain some obvious advantages, mostly related to economies of scale. Once an organization’s computing needs can no longer be met by the computers siting on people’s desks, a significant investment is required to install a room full of servers, especially if the computing needs are variable, in which case the servers must generally be scaled to meet maximum load and then sit idle the rest of the time.

Several enterprising companies have installed large data centers (read: mainframes) and have begun selling these computing resources to the public. Amazon, for example, is estimated to have 450,000 servers in their EC2 data centers. EC2 compute time can be purchased for as little as two cents an hour; a souped-up machine with 32 processors, 244 GB of RAM and 3.2 TB of disk space currently goes for $6.82 an hour. Network bandwidth is extra.

Yet wasn’t the whole point of the PC revolution to get away from centralized hardware? Can you really take a forty-year-old idea, call it by a new name, hype it in the blogosphere, and ride the wave as everyone runs back the other way?

In its day, at least as much hype surrounded the client-server model as now envelopes the cloud. Information technology advances rapidly enough that every new development is trumpeted as the next automobile. A more balanced perspective is to realize that there are merits to both centralized and distributed architectures, and after two decades of R&D effort devoted to client-server, we’re now starting to see some neat new tools available in the data center.

One of the nicest features available in the cloud is auto-scaling. Ten years ago, I ran freesoft.org by buying a machine and finding somewhere to plug it into an Ethernet cable. The machine had to be paid for up front, and if it started running slow, the solution was to retire it and buy another one. Now, running in Amazon’s EC2 cloud, I pay two cents an hour for my baseline service, but with the resources of a supercomputer waiting in reserve if it starts trending on social media.

A supercomputer! That’s what lurks behind all of this! A great number of these competing cloud architectures boil down to competing proposals to build a supercomputer operating system, coupled with an accounting system that provides a predictable price/performance ratio. Virtualization is one of the most popular models to achieve this.

Yet virtualization has been around for decades! IBM’s VM operating system was first released in 1972! Running a guest operating system in a virtual machine has been a standard paradigm in mainframe data centers for over 40 years! IBM’s CMS operating system has evolved to the point where it requires a virtual machine – it can’t run on bare hardware.

I’d like to see an open source supercomputer operating system, capable of running a data center with 100,000 servers and supporting full virtualization, data replication, and process migration. Threads are the way to write applications that run across multiple processors, so we should have a supercomputer operating system that can run on 100,000 processors. GNU’s Hurd might be a viable choice for such an operating system.

Dynamic addressing

Saturday, July 26th, 2008

A theorem of Thorup and Zwick (Proposition 5.1 in 2001’s Approximate Distance Oracles) states that a routing function on a network with n nodes and m edges uses, on average, at least min(m,n^2) bits of storage if the “route stretch” (the ratio between actual path length and optimal path length) is less than 3 (i.e, if two nodes are two hops apart, the actual route taken between them must be less than six hops). On the Internet topology, we can expect the n^2 term to dominate, so spreading these n^2 bits out among n nodes yields an average of n bits per node – i.e, each router’s routing table has to hold one bit for every device on the network.

Not a very encouraging result for those of us designing routing protocols.

Yet there is hope. The result is only an average. We can do better than the average if we allow our routing function to be skewed towards certain network topologies. And it occurs to me that the Internet doesn’t change fast enough that we can’t skew our routing function towards the current network topology.

How can we do this? With dynamic addressing: skewing our address summarization scheme to reflect the current network topology.

(more…)

DNS-based identification

Tuesday, June 10th, 2008

I’ve written before about how the failure of source routing created the need for NAT, but that post didn’t address the basic security problem with source routing that caused ISPs to disable it. It allows a man-in-the-middle attack where a machine can totally fabricate a packet that claims to come from a trusted source. There’s no way that the destination machine can distinguish between such a rogue packet and a genuine packet that the source actually requested be source routed through the fabricating machine. At the time, Internet security was heavily host-based (think rsh), so this loophole became perceived as a fatal security flaw that led to source routing being derogated and abandoned.

A quarter century later, I think we can offer a more balanced perspective. Host-based authentication in general is now viewed as suspect and has largely been abandoned in favor of cryptographic techniques, particularly public-key cryptosystems (think ssh) which didn’t exist when TCP/IP was first designed. We are better able to offer answers to key questions concerning the separation of identity, address, and route. In particular, we are far less willing (at least in principle) to confuse identity with address, if for no other reason than an improved toolset, and thus perhaps better able to judge source routing, not as a fundamental security loophole, but as a design feature that became exploitable only after we began using addresses to establish identity.

Can we “fix” source routing? Perhaps, if we will completely abandon any pretext of address-based authentication. What, then, should replace it? I suggest that we already have our address-less identifiers, and they are DNS names. Furthermore, we already have a basic scheme for attaching cryptographic keys to DNS names (DNSSEC), so can we put all this together and form a largely automated DNS-based authentication system?

(more…)

Java Caches

Wednesday, August 23rd, 2006

I’ve been thinking for several years about the flaws in the Internet’s (nearly non-existant) caching model, and have reached several conclusions. First, caching policy is very difficult, basically impossible, to specify in some arbitrary protocol. This is one of the biggest problems we’ve got with caching – the cache manager has a lot of settings he can adjust, but the data provider has almost none – basically cache or don’t cache (oh yeah, he can specify a timeout, too). So, I’m led to conclude that data providers need a very flexible way to inform caches of their data’s caching policy, like a remote executable format, i.e. Java. My second conclusion is that what limited caching we’ve got is destroyed when people want to provide dynamic content. The only way I can see to cache dynamic content is to cache not the data, but the program used to create the data, and expect the client to run the program in order to display the data. And we don’t want to do that on the caches (if we can help it) for performance reasons. Again, a remote executable format, this time on the client, i.e. Java. My third major conclusion is that caching is a multicast operation and requires multicast support to be done, but that’s for a diferent paper. Thus, I’m proposing an integrated Java-based caching architecture using what I call “cachelets” on the caches to provide a flexible and usable caching architecture.

(more…)

Will Multicast kill Packet Switching?

Thursday, May 18th, 2006

With the demise of ATM, many technologists believe that circuit switching is dead and packet switching has won. Yet the current Internet multicast architecture is essentially circuit switching in disguise, since every router along the path of a multicast session has to maintain explicit connection state. Which leads me to wonder… can packet switching support multicast, or is circuit switching required?

(more…)

NAT and the Failure of Source Routing

Monday, May 15th, 2006

Paul Francis, in the conclusion of his 1994 Ph.D. thesis, traces the evolution of the IPv4 address scheme. After quoting a June 1978 Clark/Cohen paper (IEN 46), Francis notes:

    Well, something happened here. An argument was put forth that 32 bits is enough because the address does not have to do routing – the source route can handle the rest. Clearly it was recognized that a variable length something was needed, but the source route was deemed sufficient for that, and the 32-bit address won out in the end. So, perhaps what killed IP is not that the address is too short (though probably it is), but that the ability for DNS to hand a host a source route (which it could then put in the header so that the right thing could happen in the network) was not created.

    (p. 177)

Not only did the failure to fully implement source routing (in DNS) make it impossible to address into a private network, it also created the situation where NAT had to be implemented as it was.

(more…)

Lightweight Multicast (LWM)

Sunday, May 14th, 2006

The Internet has become infamous among network professionals for its near pathological inability to deploy multicast. The Internet’s current multicast specifications place significant demands on the network, most significantly the need for routers to maintain a multicast routing tree for each multicast address. I propose a lightwight multicast (LWM) designed specifically to alleviate this requirement by listing a full set of destination addresses in each packet header, using a header option.

(more…)

A Theory for Analyzing Addressing Structures

Monday, May 8th, 2006

Years ago, I described the “fundamental principle of routing” that “logical addresses correspond to physical locations”. This implies some kind of relationship between addressing structure and network topology. Using concepts from (mathematical) topology, we can make this relationship more precise, and obtain a theory for analyzing addressing structures. I use this theory particularly to note the inadequacy of CIDR and to establish a framework for analyzing possible extensions or replacements to CIDR.

(more…)

The Location Layer

Friday, April 21st, 2006

Over the past quarter century, stack-based layered architectures have become ubiquitous in networking, most notably the seven-layer OSI model. OSI seperates a Network Layer (responsible primarily for routing) from the Transport Layer and other layers above it. In recent years the Internet’s difficulty in handling mobile devices had suggested flaws in this original design. I propose that a new layer, the “Location Layer”, is needed between the Transport and Network Layers, whose function is to locate network resources, including mobile devices.

(more…)

Standardized caching of dynamic web content

Thursday, August 1st, 2002
Internet Engineering Task Force
INTERNET-DRAFT
Expires March 2003



             Standardized caching of dynamic web content

			   by Brent Baccala
			 baccala@freesoft.org
			     August, 2002

			 Address comments to:
		 dynamic-content-caching@freesoft.org

			Comments archived at:
       http://www.freesoft.org/Essays/dynamic-content-caching/



This document is an Internet-Draft and is subject to all provisions of
Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups.  Note that other
groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html

The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html



			       ABSTRACT

   Summarizes the present state of web caching technology.  Points out
   the need for caching dynamic web sites, and the inadequacy of
   present caching technology for anything but static sites.  Proposes
   the adoption of Java servlets, cryptographically signed Web
   Application Archives (WARs), and LDAP as standards for dynamic web
   caching, using an expanded interpretation of existing DNS standards
   to locate and authenticate cached information.

The World Wide Web (WWW), probably the most successful networking
technology of the 1990s, provides a global graphical user interface
(GUI) that presently dominates the Internet.  The current design of
the web has an overwhelming advantage over older connection-oriented
protocols such as TELNET or X Windows.  The web is data-oriented, not
connection-oriented, or is at least more so than conventional
protocols.  A web page is completely defined by a block of HTML, which
is downloaded in a single operation.  Highlighting of links, typing
into fill-in forms, scrolling - all are handled locally by the client.
Rather than requiring a connection to remain open to communicate mouse
and keyboard events back to the server, the entire behavior of the
page is described in the HTML.

The advent of web caches changes this paradigm subtly, but
significantly.  In a cached environment, the primitive operation in
displaying a web page is no longer an end-to-end connection to the web
server, but the delivery of a named block of data, specifically the
HTML source code of a web page, identified by its URL.  The presence
of a particular DNS name in the URL does not imply that a connection
will be made to that host to complete the request.  If a local cache
has a copy of the URL, typically because it was requested and
retrieved earlier, it will simply be delivered, without any wide area
operations.  Only if the required data is missing from the local
caches will wide area network connections be opened to retrieve the
data.  Generally, caches store content based on the URLs, and
sometimes use inter-cache protocols such as ICP to communicate to
other caches which URLs they possess.  A variant on this scheme is the
web replica, in which an entire web site, or some logical subsection
of one, is duplicated elsewhere.

Experience with web caches demonstrates that they provide several
benefits.  First, the bandwidth requirements of a heavily cached,
data-oriented network is much less than an uncached,
connection-oriented network.  A cached copy of a web page, stored
anywhere on the network, works as well as the original.  As the
network becomes more heavily cached, fewer and more localized
connections are required to carry out various operations, reducing
overall network load.  Furthermore, cached or replicated web sites are
more fault-tolerant, since their data can still be accessed even if
the origin server fails or the network becomes partitioned.  A general
consensus seems to exist that caching improves network performance;
more widespread adoption of web caching has been limited by technical
challenges.

One of the greatest of these challenges is caching dynamic content,
that is, pages generated by software as they are requested, such as
response pages to search requests.  Presently, web caching protocols
provide means for including meta information, in either HTTP or HTML,
that inhibits caching on dynamic pages, and thus forces a connection
back to the origin server.  While this works, it negates the
advantages of caching.  To maintain the flexibility of dynamic content
in a cached network, we need to lose the end-to-end connection
requirement and this seems to imply caching the programs that generate
the dynamic web pages.  While cryptographic techniques for verifying
the integrity of data have been developed and are increasingly widely
deployed, no techniques are known for verifying the integrity of
program execution on an untrusted host, such as a web cache.  Barring a
technological breakthrough, it seems impossible for a cache to
reliably run the programs required to generate dynamic content.  The
only remaining solution is to cache the programs themselves (in the
form of data), and let the clients run the programs and generate the
dynamic content themselves.  Thus, what's needed is a standard for
transporting and storing programs in the form of data.

A closely related problem arises when replicating a web site.  A
significant hurdle for building web replicas is the lack of a standard
to deliver the executable components that underlay dynamic content.
While scripting languages such as Perl and Python are readily
available, installing a web replica almost invariably requires
tweaking configuration files and downloading various additional
packages needed by the scripts.  Without a standard for dynamic
content, there is simply no way to automatically replicate a web site,
unless its content is completely static.  Also, running a Perl script
typically provides little in the way of security.  Either the script
must be carefully reviewed by the installer, or the author must simply
be trusted.

Java "servlets" are a step in the right direction, since they provide
a CGI-type capability that enables a web cache to present dynamic
content without a connection to the origin server.  Since they are
Java-based, they provide solutions to the security issues that
surround something like Perl.  Java's security model provides the
tools to limit servlet access to the host system.  This allows a
cached servlet to reference a collection of Java classes it needs for
proper operation, and have them loaded automatically without the need
of manual intervention.

Part of the Java servlet specification is WAR (Web Application
Archive), an extension to JAR that provides Java servlets, HTML and
JSP pages, and XML meta data all packaged up into a single archive
file to provide a "web application".  In the current implementation,
the server administrator "installs" the WAR at a particular URL by
loading it onto a Java servlet-enabled web server.  If the WAR format
were altered slightly to include, perhaps in the XML meta data, a
"master" URL, and the servlet-enabled web server were to function more
as a proxy, handling requests locally if it possessed a valid WAR,
passing them along otherwise, this would be a big step in the right
direction.  Ultimately, though, to get away from having to trust a
proxy to execute WAR content, the client has to execute the content
itself.  Servers and caches should eventually do nothing but hand out
data, and the responsibly for executing it should fall exclusively to
the client, not the cache.  For the time being, using a local, trusted
cache will enable experimentation with these ideas without changing
client implementations.

Using WARs for application caching, instead of the manual installation
of applications that they were originally designed for, presents some
challenges.  In addition to adding XML entries to the WAR to specify
the base URL, additional entries may be needed to specify a time
interval for which the WAR is valid, as well as whether an outdated
WAR can continue to be used if a more recent one can't be retrieved.
Furthermore, Java servlets typically run with a fairly trusted
security model.  A more restricted security environment should be used
for cached WARs downloaded from foreign web sites.

Also, provisions should be made for incremental updating of the WAR,
since only a portion of a large archive may change in an update.
Although protocols such as rsync have been developed to incrementally
update files, they have limitations.  Rsync depends on changes being
localized within the file.  Files with small changes spread widely
across them, such as search engine indices, don't update well using
rsync, suggesting that something more flexible would be preferred.
Since the WAR is already Java-based, perhaps specifying Java classes,
or pointers to Java classes, in the WAR for performing incremental WAR
updates would provide a powerful mechanism for tailoring the update
mechanism to the type of files contained in the archive.  Perhaps many
of these functions, like deciding the validity of a WAR, should be
specified via Java classes, for maximum flexibility.

Security and authentication are major concerns, especially in a cached
environment.  In this case, some protocols exist to provide
authentication services, yet have many outstanding issues.  Some are
not widely deployed - DNS key services, for example.  The most widely
deployed solution - X.509 certificates - has been priced and managed
into a realm when only e-business sites can realistically justify
their costs.  Web security can't be just for those who can and will
shell out hundreds of dollars for certificates that keep expiring.  In
a heavily cached environment, it's easier than ever to spoof
somebody's URLs, and X.509-based authentication needs to be in place
for 99% of the net's web sites, not 1% of them.  Standards exist
for storing public keys in DNS (KEY and CERT resource records),
which can be used to validate signed JAR/WAR files.

For more rapid response time, the Range: header could be used to
retrieve first the WAR file's table of contents, then the compressed
data of the particular URL, resulting in a retrieval time comparable
to straight HTTP, ignoring the search time required to find the cache
item to begin with and the compilation/startup time of any dynamic
code (both of which may be significant).  Of course, in addition to
such a "partial retrieval", a cache could do a "full retrieval",
obtaining the entire packaged WAR and begin sharing it with other
caches.  The decision of how to choose between partial and full
retrieval is left "for further study", in other words, the user has to
make those decisions manually until we figure it out better.  Napster
has demonstrated that letting the users make caching decisions
manually is workable, so long as the cache items are reasonably sized
(not too large or too small) and well labeled.

A major choice remains, that of the search protocol to find the cached
WARs.  Mainstream caching research tends to largely ignore the most
successful example of a cached network service - Napster and its
various spinoffs, most notably Gnutella, which seem to go by the
buzzword peer-to-peer file sharing, or P2P.  For example, RFC 3040,
"Internet Web Replication and Caching Taxonomy", a January 2001
document discussing "protocols, both open and proprietary, employed in
web replication and caching today," never mentions the word "Napster".
Since peer-to-peer was designed to share music and not HTML documents,
the oversight can be forgiven, but this point needs to be made and
made strongly - Napster, Gnutella, and friends _are_ caching services,
and by far the most successful ones built to date.  Peer-to-peer
seems to be the way to go.

The legal problems of Napster and the highly critical reception of the
technical community to Gnutella suggest against either of these
protocols.  At present, LDAP seems the best choice, due to its
maturity as a protocol, the widespread availability of both client and
server implementations, and its straightforward application to the
problem at hand.  The only serious issue surrounding LDAP is the lack
of a standardized means for server location in a P2P environment, the
critical issue swirling around Gnutella.

I suggest dealing with both the security issues and the P2P server
location issue through a simple solution: assume the correct operation
of DNS even in the face of server failure.  This allows site
administrators to use resource records to specify both a set of LDAP
servers to search for WARs, as well as cryptographic keys to verify
the contents of those WARs once they are retrieved.  Although this
makes proper operation of a cached web site dependent on proper DNS
operation, this should presently be a minor tradeoff, since proper web
site operation is already based on DNS, and DNS had proven to be one
of the most reliable of the Internet technologies.

Thus, to enable dynamic web caching, as outlined in this document, a
web server administrator should add two kinds of additional resource
records to the web server's DNS records.  First, a set of SRV records
should specify a set of LDAP servers, any of which can be searched for
the web site's WARs.  These LDAP servers should form a replicated set,
so that a response from any one of them should be considered a
complete answer by a client.  These servers may also allow arbitrary,
unauthenticated web caches to add entries to the LDAP directory when
they elect to cache one or more of a site's WARs.  Since clients are
expected to cryptographically verify a WAR upon retrieving it,
allowing unauthenticated additions to an LDAP directory should not
allow site spoofing, but a large number of bogus WAR entries could
form the basis for a denial of service attack.  A benefit of this
proposal is that site administrators can select sets of LDAP servers
based on their own policies.  At least one set of publicly updatable,
replicated, highly available LDAP servers should exist for the use of
small web sites without the capability to set up large replicated
installations.

The DNS SRV records in question can simply be the "_ldap._tcp" records
mentioned as a example in RFC 2782.  Thus, to specify LDAP servers for
registering or searching WARs for "www.freesoft.org" URLs, DNS SRV
records should be added for "_ldap._tcp.www.freesoft.org".  In the
case of the publicly available LDAP servers mentioned above, and other
LDAP servers used by multiple web sites, careful consideration should
be given to making the "_ldap._tcp" record a CNAME pointing to a set
of SRV records, allowing the LDAP server administrators to modify the
list of LDAP servers without requiring changes to every web site using
the service.  Furthermore, the use of "_ldap" for this new service may
conflict with existing LDAP applications.  Another name, perhaps
"_webldap" might be a better choice.  Another possibility would be to
use both names, specifying that "_webldap" would take precedence over
"_ldap" for this application, and the "_ldap" records would be used
only if "_webldap" records did not exist.  This would allow the
Internet community to use "_webldap" if needed, expecting that this
name would simply fall into disuse if only "_ldap" is really needed.

Also, the web administrator will need to add at least one KEY record
specifying a public key that must be used by clients to validate the
integrity of a retrieved WAR.  Due to the ease with which a bogus WAR
could be registered with a public LDAP service, this is regarded as a
critical step.  The administrator must provide the KEY record and the
client must validate it before trusting the WAR.  Unsigned WARs are
invalid and so are DNS entries without KEY records - both the SRV and
KEY records must be present.  Perhaps a CERT record would be a better
choice than KEY, also, we need to consider how multiple KEY or CERT
records should be handled by a client.

So, for example, consider the "www.freesoft.org" web site, originally
specified in DNS like this:

   $ORIGIN freesoft.org.

   www		IN  CNAME          sparky.freesoft.org.
   sparky	IN  A	           4.22.66.35

To add WAR-based caching of dynamic web content for this site, records
similar to these should be added:

   www			IN  KEY	           --- public key goes here ---
   _ldap._tcp.www	IN  CNAME          ldapsrv
   ldapsrv		IN  SRV  0 0 389   ldap1.freesoft.org.
   ldapsrv		IN  SRV  0 0 389   ldap2.freesoft.org.
   ldapsrv		IN  SRV  0 0 389   ldap3.freesoft.org.

Retaining the original CNAME record would require moving the KEY
record to "sparky", since CNAME records can't co-exist with other
records.  An alternative to retaining the original server
configuration would be to replace the "www" entries with A records
pointing to a set of web caches.  Thus, any legacy client trying to
retrieve a page from this web site would be automatically directed to
a web cache.  It'd be convenient to specify a CNAME for "www" pointing
to a set of A records for the web caches, but of course this would
preclude a unique KEY record for the server.  Perhaps the KEY record
should appear on a unique name, such as "_key.www", specifically to
permit this feature.  The interaction of CNAME with the other resource
records requires more consideration.

RFC 2535 specifies the structure of KEY records, and recommends the
assignment of new Protocol Octet values for new applications.  If
this proposal is adopted, IANA should assign a new Protocol Octet
value for validation of dynamic web archives.

A typical client request would follow these steps:

1. Client is configured to use a local web cache, or, attempts a
   standard retrieval and gets A records for web caches

2. Client sends request to web cache

3. Web cache does DNS lookup and gets KEY and SRV records

4. Web cache does LDAP search for the URL and gets a list of WAR
   directory entries, placed there by other (remote) web caches

5. Web cache picks an entry, contacts the remote cache using HTTP
   and either retrieves the entire WAR or just the parts it needs
   to serve the requested URL

6. Web cache validates that WAR was signed using the private key
   corresponding to the public key retrieved in the DNS KEY record,
   and recurses to step 5 (using a different remote cache) if not

7. If the cache elected to retrieve the entire WAR, it (subject to
   considerations like being behind a firewall) registers itself with
   one of the site's LDAP servers as possessing the WAR and being
   willing to serve it to other sites

7a LDAP servers replicate this information among themselves

8. Web cache runs the Java in the WAR to generate the dynamic web page
   and returns the result to the client

Several other options present themselves.  Perhaps the LDAP directory
should include entries for web caches willing to run the Java and
serve the dynamic pages themselves, though this would be present a
security risk since those caches might be untrusted by either client
or server.  Perhaps provision could be made for the server to issue
X.509 certificates certifying certain web caches as trusted.  Perhaps
the user should be prompted before embarking on the potentially time
consuming process of retrieving and locally processing a WAR.
Finally, the functionally of a "locally trusted cache" should
ultimately be rolled into the client itself, which should retrieve and
verify the integrity of the WAR before running the Java itself.

In summary, I recommend the following steps:

1. Recognize the importance of data-oriented design, as opposed to
   connection-oriented design.  Break the dependence on special server
   configurations and realize that the client has to do almost all the
   work in a scalable, cached, redundant web architecture.

2. Select standards for the network delivery of executable web
   content, and for packaging the contents of a web server into a
   single compressed archive.  Java/WAR seems the most likely current
   candidate.

3. Develop an LDAP schema for registering WARs, and for searching
   the registrations to find the WARs matching a particular URL.

4. Extend the WAR specification to include root URL, Java classes for
   determining lifespan of WAR, performing incremental updates, and
   other identified needs.  Specify the security environment in which
   these "foreign" WARs will operate.

5. Extend Squid to support the algorithm outlined above.  Alternately,
   extend Apache Tomcat to function as a web cache, with similar
   features.

The caching scheme outlined above is far from perfect.  In my essay
"Data-oriented networking" I discuss more long-term prospects.
However, the current proposal has several key advantages: it can be
deploying using existing technology; it requires no client-side
changes or client-visible protocol updates; it allows web sites to
easily opt in so long as one public set of LDAP servers and/or trusted
caches are available; and it solves a pressing problem.  Ultimately,
the Internet is a work in progress, and its more technically savvy
users are probably ready for a serious attempt at a working caching
scheme for dynamic content.




REFERENCES

Data-oriented networking

   "Data-oriented networking", Brent Baccala, Internet Draft
      http://www.freesoft.org/Essays/data-networking/

Domain Name System (DNS)

   Dozens of RFCs specify various aspects of DNS operation.  Only
   those most pertinent to basic DNS operation, SRV records, and
   KEY/CERT records are listed here.

   RFC 1034 - Domain Names - Concepts and Facilities

   RFC 1035 - Domain Names - Implementation and Specification

   RFC 1912 - Common DNS Operational and Configuration Errors

   RFC 2535 - Domain Name System Security Extensions

   RFC 2536 - DSA KEYs and SIGs in the Domain Name System

   RFC 2538 - Storing Certificates in the Domain Name System

   RFC 2782 - A DNS RR for specifying the location of services (DNS SRV)

   RFC 3110 - RSA/SHA-1 SIGs and RSA KEYs in the Domain Name System

   Paul Vixie's Internet Software Consortium produces BIND, the most
   widely used (and freely available) DNS server
      http://www.isc.org/

Lightweight Directory Access Protocol (LDAP)

   RFC 2251 - LDAP v3 (protocol spec)

   RFC 2252 - LDAP v3 Attribute Syntax Definitions (schema spec)

   OpenLDAP is an actively developed (as of mid-2002) open source LDAP
   server, and C-based client library.  Various client implementations
   exist for other languages, such as Perl
      http://www.openldap.org/

Rsync

   rsync is a program and protocol developed to incrementally update
   files that have only been slightly modified, by first transferring
   a set of MD5 digests that identify which parts of the file have
   been modified and only transferring those parts
      http://rsync.samba.org/

Java

   Java Virtual Machine (JVM) specification
      somewhere on http://java.sun.com/

   Bill Venner's excellent Under the Hood series for JavaWorld
   is a better starting point than the spec for understanding JVM.
   He also has written a book - Inside the Java Virtual Machine
   (McGraw-Hill; ISBN 0-07-913248-0)
      http://www.javaworld.com/columns/jw-hood-index.shtml

   Java 2 language reference
      somewhere on http://java.sun.com/

   Java languages page (other languages that compile to Java VM)
      http://grunge.cs.tu-berlin.de/~tolk/vmlanguages.html

   Criticism of Java
      http://www.jwz.org/doc/java.html

Java Servlets/WARs

   "Tomcat is the servlet container that is used in the official
    Reference Implementation for the Java Servlet and JavaServer Pages
    technologies."
      http://jakarta.apache.org/tomcat/

   Java Servlets - server-side Java API (CGI-inspired; heavily
   HTTP-based) The Java servlet specification includes a chapter
   specifying the WAR (Web Application Archive) file format, an
   extension of ZIP/JAR
      http://java.sun.com/products/servlet/

Caching

   RFC 3040 - Internet Web Replication and Caching Taxonomy
      broad overview of caching technology

   RFC 2186 - Internet Cache Protocol (ICP), version 2

   RFC 2187 - Application of ICP

   Squid software
      http://www.squid-cache.org/

   NLANR web caching project
      http://www.ircache.net/

   Various collections of resources for web caching
      http://www.web-cache.com/
      http://www.web-caching.com/
      http://www.caching.com/

   IETF Web Intermediaries working group (webi)
      http://www.ietf.org/html.charters/OLD/web-charter.html

   IETF Web Replication and Caching working group (wrec)
      http://www.wrec.org/

   RFC 3143 - Known HTTP Proxy/Caching problems

   Cache Array Routing Protocol (CARP) - used by Squid
      http://www.microsoft.com/Proxy/Guide/carpspec.asp
      http://www.microsoft.com/proxy/documents/CarpWP.exe

   RFC 2756 - Hypertext Caching Protocol (HTCP) - use by Squid

Napster and its variants

   Napster, the original peer-to-peer file sharing service, has been
   fraught with legal difficulties, having recently entered bankruptcy
      http://www.napster.com/

   Napster's protocol lives on, even if the service is dead.  It's
   basically a centralized directory with distributed data
      http://opennap.sourceforge.net/
      http://opennap.sourceforge.net/napster.txt

   Gnutella has emerged as the leading post-Napster protocol,
   employing both a distributed directory and distributed data
      http://www.gnutella.com/
      http://www.gnutelladev.com/
      http://www.darkridge.com/~jpr5/doc/gnutella.html

   Several popular clients use the Gnutella network and protocol
      http://www.morpheus-os.com/
      http://www.limewire.org/
      http://www.winmx.com/

   Other proprietary peer-to-peer systems
      http://www.kazaa.com/

   Other free peer-to-peer systems
      http://www.freenetproject.org/