I’ve been thinking for several years about the flaws in the Internet’s (nearly non-existant) caching model, and have reached several conclusions. First, caching policy is very difficult, basically impossible, to specify in some arbitrary protocol. This is one of the biggest problems we’ve got with caching – the cache manager has a lot of settings he can adjust, but the data provider has almost none – basically cache or don’t cache (oh yeah, he can specify a timeout, too). So, I’m led to conclude that data providers need a very flexible way to inform caches of their data’s caching policy, like a remote executable format, i.e. Java. My second conclusion is that what limited caching we’ve got is destroyed when people want to provide dynamic content. The only way I can see to cache dynamic content is to cache not the data, but the program used to create the data, and expect the client to run the program in order to display the data. And we don’t want to do that on the caches (if we can help it) for performance reasons. Again, a remote executable format, this time on the client, i.e. Java. My third major conclusion is that caching is a multicast operation and requires multicast support to be done, but that’s for a diferent paper. Thus, I’m proposing an integrated Java-based caching architecture using what I call “cachelets” on the caches to provide a flexible and usable caching architecture.
Like an “applet” running on a client or a “servlet” running on a server, a “cachelet” runs on a cache. Data items have cachelets (to control their caching) associated with them. When a cache considers caching a data item, it downloads the cachelet, which then manages the cached data associated with it. Multiple cachelets can be running on the same cache, each managing a different set of data. Cachelets are untrusted, running in a Java VM on a foreign host.
Initially, when data is put into a system on a server, a cachelet is provided to manage it. So there’s always at least one copy of the data’s cachelet – the copy managing the original data on the origin server. Such a cachelet would be ‘nailed up’ – it would never get purged from its “cache”. Also, its “cached data” would take the form of files provided by the author containing the data itself. Ideally, we’d like this cachelet to be identical to all the others (i.e, no need for a special servlet), so the cachelet API should be built so that a file in the underlying filesystem can be treated as cached data.
Other caches might want a copy, too. The basic idea is that when a cachelet is going to transmit data it multicasts a notification to any interested caches. So how does a cache get “interested”? Well, the number one way is that it is located in physical (in a network sense) proximity to the path the data is going to take through the network. With this in mind, I can think of two possible models. The first uses a directed, scoped multicast, that could be implemented using a combination of Lightweight Multicast (LWM) and traditional multicast. When a transfer is about to begin, the source transmits a directed multicast to the destination with an annoucement of this. This can be formulated as a packet addressed to an “all caches” multicast address, along with an LWM header option directed at the unicast destination. Caches can then subscribe to this multicast address within some kind of radius or domain. As this a directed multicast passes along the source-to-destination path, any caches along the way subscribed to the “all caches” address will receive a copy of the notification.
Of course, this presupposes a capability (directed multicast) that doesn’t exist (yet). How can we achieve cache rendezvous without working multicast? It’s a hack, but we can expect data requests to be sent with the IP Record Route option, so the transmitting cachelet will have at least an “advisory route” about the path the data will be taking, which can be supplemented with local routing info and something like traceroute. Armed with this information, the transmitting cachelet does a series of DNS lookups, in in-addr.arpa (or part of the DNS tree structured like in-addr.arpa), to find caches that have registered (using dynamic DNS) their interest in a particular CIDR block. Now we have addresses of those caches that have expressed interest in any of the addresses along the path to the destination. We transmit a notification of the upcoming transfer to those caches, giving them the opportunity to subscribe to the (multicast) transfer, and proceed to transmit.
Having received a notification, a cache has to decide whether to cache the data. The first step is to download the cachelet. Presumably, this will almost always happen. Why? Because the cachelets will be designed to be flexible, so a single cachelet can manage many similar data feeds. That means you really only need a small number of cachelets in your entire network. The multicast announcement of the upcoming data transfer can include the URL of the cachelet. So once the cache’s got the cachelet, it has to…
- initialize the cachelet
an ISP might want to influence the caching decisions of its subscribers
the simplest way to do this is for the ISP to provide a cachelet to run on its subscriber’s caches
thus, we’ll need some kind of inter-cachelet communications standard to allow the ISP-provide cachelet to influence its fellows
at the very least, upon initializing, the cachelet should multicast to an all-local-cachelets address, announcing its presence to other cachelets on the same cache
no, our current multicast scheme can’t really handle this. Basically, we need multicast ports instead of multicast host addresses. We can punt on this and let it find out about other local cachelets because they’ve registered themselves (and their port numbers) in DNS
it may also want to subscribe to multicast channels, or register itself using dynamic DNS
consider cryptographic authentication of the cachelet
might want to have it reviewed by a central authority
- considering a new cache item
a notification has been received of a current or near-future transfer
do we want to cache it?
we know: what it is (URL), where it will be (advisory route), and when it will be there (some kind of timestamp)
cachelet may need to do some network chit-chat (i.e, download some information specific to this particular data item) before it can fully advise the host
cachelet finally does advise the host: how badly do we want to cache this item (on a 0-255 scale, perhaps), how much space will probably be required to cache this item
how does the host compare advise from different cachelets?
it can measure how much data is being transfered how often from which cache items and use this as a baseline to compare advise
- Cacheing a data item
The transfer itself, ideally, should be multicast. We’re transmitting to, perhaps, a client as well as several caches. This is inherently multicast, but again, we can’t expect this to work in the current Internet. Probably we’ll transmit to the (closest?) cache that wants a copy, and it will have to turn around and relay the data to the client or other caches. If the client requested the data from the origin server using TCP, we’re got a real problem here. If we’re using HTTP, I guess the origin server can redirect the client to the cache (using “305 Use Proxy”, but this is for a single request only), but what we really want is something like TCP-over-multicast, some kind of reliable stream service which can deliver to multiple destinations – both the client and the cache. Again, beyond the capability of the contemporary Internet. If we redirect to the cache, there’s a denial of service issue here.
We really need to rethink HTTP caching, probably with an eye towards using a local Java cache capable of caching, or at least matching, a entire block of URLs, possibily using regular expressions and a sub-cachelet scheme.
- serving a data item
cachelet gets a request from a data transfer
the request goes to _the_cachelet_, not the cache
the cachelet should ask the cache host if it’s OK to serve the data now (based on policy and things like host/network load) needs to be part of the cache/cachelet API
cachelet might want to know general policy from the cache, so it can decide if it wants to advertise the ability to serve data at all
cachelet might chit-chat over the network, then uses its high-speed API to begin the transfer
the cache host itself does almost nothing other than assist the transfer
- cachelet wants to preemptively cache, i.e, initiate its own transfer
cachelet advises the host: how badly we want to cache, how much space
might want to do multiple advisories – we want to cache X1 MB at priority Y1, and an additional X2 MB at (lower) priority Y2
- cachelet wants to change its advice on previously cached data
- host wants to clean out the cache
maybe host keeps track of advice (priority) on cached data, so it already knows what it wants to clean out
or maybe host uses a 0-255 priority to tell the cachelet how badly it wants to clean up the cache, and the cachelet then replies with the amount of data it’s willing to clean out at that priority
this priority could be the same as the caching priority. That way, when a cachelet advices the host “I want to cache X MB of data at priority Y” and the host doesn’t have X MB free, it asks the other cachelets how much space they are willing to free up for data at priority Y
- host is booting the cachelet off the cache
this can happen at any time, but it’d be nice if the cache advised the cachelet and gave it a chance to clean up
Trying to get this cachelet scheme (or any caching scheme) working with the existing web architecture will be difficult. The web was never designed to cache, and its current design, particularly the way dynamic content is done, is very hostile to caching. Since you can’t cache the programs that produce the dynamic content, you can’t cache the dynamic content at all, and this kills it. The only way to cache the hordes of dynamic content running around the web is to convince web designers to write their dynamic content into cachelets. We can do this without overloading our caches if we specify that HTML dynamic content will only be served to the local host, thus ensuring that the computational cost is bourne not by an ISP’s cache but by the client’s local cache. Of course, there will have to be a significant installed base of Java caches before web designers will do this.
So I propose an end-around. Let’s build a _video_ client (see my other blog posting about this) designed to use cachelet-based caches to cache the video feeds. Nobody’s doing on-demand video on the Internet right now, so we’ve got the architecture to ourselves and can do it however we want. If we can get it working well, then hopefully everybody comes running to it as the next killer app (ohh, ahh, video over the internet), we start get cachelet-based video caches deployed widely, the web guys want to get in on the act, and we say, “why yes, it can cache HTML, you just need to do this…”
SCENARIOS
- client downloading a new (video) item using lightweight multicast (LWM)
- client downloads applet
- applet wants a data transfer
- applet does a DNS lookup to find existing caches for data, and picks one based on nearest routing/closest match
- applet sends foreign cachelet a download request
- foreign cachelet multicasts a notification that download is commencing – possibly using DNS to lookup the IP addresses along the route to the applet and see if any caches have registered an interest in those addresses
- local cachelet picks up notification, decides it wants to cache, and subscribes to the multicast (along with the applet)
- transfer begins – both applet and local cachelet receiving
- local cachelet sends DNS update announcing that it can now serve the item
- applet gets DNS notification of “new” cache available and decides to switch to local cachelet because it’s closer
- RTP handoff
- foreign cachelet sees that no non-cachelets are left subscribed
- foreign cachelet accelerates the download
At this point, since the cachelets are just talking to each other, they can increase the speed past real-time. This lets them probe the connection to see how much bandwidth it can support. If you start losing data, you can throttle back and retransmit(!) since you’re in the “future” as far as the user is concerned. You can dynamically pick a video resolution to suit your link capacity in this way. The applet/cachelet could be designed to anticipate future downloads, i.e, if the user is looking at menu X, then download the first second of each of the five different feeds linked to from that menu. A completely different applet/cachelet pair could be used for realtime feeds that couldn’t do this. A completely different applet/cachelet pair could be used for files and HTML that can’t tolerate data loss like RTP video.
Of course, LWM is pie in the sky, so we need a fall back solution…
- client downloading a new (video) item using unicast
- client downloads applet
- applet wants a data transfer
- applet does a DNS lookup to find existing caches for data, and picks one based on nearest routing/closest match
- applet sends foreign cachelet a download request
- foreign cachelet unicasts notifications that download is commencing – possibly using DNS to lookup the IP addresses along the route to the applet and see if any caches have registered an interest in those addresses …and delays beginning the download for local cache notification…
- local cachelet picks up notification, decides it wants to cache, and notifies the foreign cachelet
- foreign cachelet notifies applet that it’s local cache will have the data
- transfer begins – only local cachelet receiving via unicast
- applet requests transfer from local cachelet
- local cachelet sends DNS update announcing that it can now serve the item (maybe – or maybe policy is that local caches only serve the local host)
- foreign cachelet accelerates the download
Almost all of this can be implemented in Java using the applets and the cachelets. Only step (a) is performed using something like HTTP, and step (j) requires a standard cache-cachelet API. Otherwise, it’s all applet/cachelet and cachelet/cachelet communications (with standard stuff like DNS and RTP thrown in), and both applet and cachelet can be easily changed by the data author.
- client downloading conventional (static) web page
- a standard HTTP cachelet has been distributed with the cache
- client web browser is set to use local cachelet as HTTP cache
- web browser wants a web page, so it queries local cache
- cachelet doesn’t have the page in its local cache
- cachelet does a DNS lookup to find existing caches for this page, and picks one based on nearest routing/closest match
- cachelet sends foreign cachelet a download request
- local cachelet downloads (how? probably TCP to foreign cachelet) a copy of the page
- local cachelet notifies DNS via dynamic update that it’s got a copy of the page
- local cachelet generates the web page for the browser
- client downloading a dynamic HTML item using unicast
- a standard HTTP cachelet has been distributed with the cache
- client web browser is set to use local cachelet as HTTP cache
- web browser wants a web page, so it queries local cache
- cachelet doesn’t have the page in its local cache
- cachelet does a DNS lookup to find existing caches for this page, and picks one based on nearest routing/closest match
- cachelet sends foreign cachelet a download request
- foreign cachelet indicates in reply that a custom cachelet is used to serve content for this URL
- local cachelet downloads (how? probably TCP to foreign cachelet) a copy of the custom cachelet
- custom cachelet (local copy) notifies DNS via dynamic update that it’s running on this new cache
- custom cachelet (local copy) generates the web page for the browser
We now have multiple HTML cachelets ‘slaved’ off a parent cachelet responsible for handling all browser interactions. Does Java provide us an easy way to hand off TCP connections from one cachelet to another?
Also, now we are faced with the problem of cache discovery for web pages over DNS. Doing this right will make sure the rest of our scheme scales. For starters, consider some like cnn.com. Having a single DNS server listing all the caches with a copy of even just the home page would be crazy. What we really want to know is just those copies ‘nearby’ is a network sense. Nearby to what? Well, the DNS request includes the IP source address of the requester, so I’d say nearby the requesting IP address.
For starters, consider the case where the DNS server(s) have complete information about where everything is cached. Except for the largest and most widely cached sites, this might be a perfectly workable model. When the DNS server sees that it can’t return a complete list of resource records in a single response packet, it sorts them according to which are closest to the requesting IP address and returns only the top hundred or so of them. The DNS server thus needs to have some routing information to sort the addresses (or just sort them by longest match IP address prefixes), but this seems fine. Also you’ll want redundant DNS servers, so a distributed database seems to be called for, but I’ve discussed how to do that in another blog.
For the most popular sites, though, the shear number of entries on DNS servers in such a scheme would become prohibitive. But this case can be optimized away behind the scene – a DNS server responsible for only a limited area of the network would only need to know of cache entries roughly within that area. The behavior to the client should be _exactly_the_same_ – a set of resource records sorted to be ‘close’ to the requesting IP address. So we can leave the DNS protocol alone and study ways to build more optimized servers to handle this somewhat special case.
In fact, we’ve already got a way to do this. The Java caches themselves can answer DNS queries. All that’s needed is a new type of cachelet that manages cached DNS resource records and answers DNS queries. A natural extension of the DNS ‘SRV’ resource record would allow DNS servers to operate on arbitrary port numbers, not just 53. So a DNS cachelet could bind to a random UDP port and handle DNS queries. We do need to make sure the cachelets can manage a distributed database as I described in the other blog. The main change required here is to make sure the DNS resolvers (both clients and recursive servers) can handle DNS servers on non-standard ports.
OTHER SCENARIOS
- ISP cache also wants a copy of something (multicast case)
- ISP cache also wants a copy of something (unicast case)
In summary, we started trying to fix up the web’s caching scheme and have ended up with a model that seems like it can cache not only HTML and video, but also dynamic content and DNS. Probably it could do a lot more. Probably if we deployed Java caches widely on the network they’d become quite popular for all kinds of things I haven’t thought of. But if we put together a design that can handle web, video, dynamic content, and DNS without having to kludge any one of them, then it probably will be flexible enough to handle most future applications as well.
OTHER
Cachelets need some high-speed support services from the underlying host in order to perform some operations that can’t be done efficiently in Java. These can be provided by a Java Native Interface (JNI) add-on, and primarily include both sending and receiving RTP streams to/from a disk cache. The cachelet should be able to begin an RTP transfer (send or recv) and just let it proceed in background. For a receive stream, cachelet needs to see timestamps and sequence numbers on the received packets, possibily along with local timestamps for when packet were received, to determine if data is missing as well as calculating network jitter. RTP transmits might only include a subset of data from a received stream (let’s say the applet wants to fast forward through a cached video stream), and might proceed at a different rate from the received stream, so the cachelet needs to be able to set (and change) the base frequency for interpreting RTP timestamps. Also, we might be transmitting just ahead of a receive, so the cachelet should be able to set transmits on data that doesn’t exist yet, in the expectation that it will be there when needed. To allow cache repair, applet might request some out-of-band transfers of individual packets via UDP and can insert them into the cache, so the cachelet probably needs to be able to interact with data in the cache, but this should be discouraged for performance reasons. Also, to allow flexibility in caching, TCP streams should be supported. If you can’t get RTP through a firewall, you might want to receive data on a TCP stream, then transmit it on an RTP stream, so this should be supported.