Archive for January, 2001

On Napster

Monday, January 1st, 2001, a website facilitating the on-line exchange of digital music, has been highly publicized by the mass media. The legal wrangling over copyright issues has overshadowed other, equally legitimate questions. Is Napster for real, or is it just hype? Are the issues it presents purely legal, or are there technical lessons to be learned, too? What does Napster reveal about the future of the Internet?

First, we need to recognize that Napster represents a real technological advance. It is one of the newest and most prominent examples of a directory service. Directory services are based on the realization that centralized data stores tend to generate performance bottlenecks. All the data being served to the clients has to come from a centralized server or a handful of centralized servers. Throwing bandwidth at the problem is sometimes realistic, but a better solution is to design more efficient networks. Distributing data sources across the network is a major, emerging technique for achieving greater efficiency.

For example, one of the reasons we currently lack decent video-on-demand services are the bandwidth requirements of video. It’s simply not feasible to construct a centralized server to feed two hour movies to a million people. The bandwidth requirements are too great; the centralized server becomes too much of a bottleneck. A Napster-esk solution would be to have thousands of video servers, each capable of serving perhaps a dozen video streams, spread all over the network. Due to the current bandwidth demands of video, this is still unrealistic, but similar schemes are immediately plausible for books, software, and websites in general.

In fact, it’s reasonable to suppose that at least 90% of the present Internet’s traffic is unnecessary. The net is young and rapidly evolving. The protocols currently in use are inefficient, some more so than others. As the network continues to mature, it will become more efficient, and the bandwidth requirements of particular applications will decrease. The present boom in bandwidth demand is driven by new users and new applications. At some point, most people will be “connected”, and the uses of the network will stabilize. From that point onward, improvements in network design will begin to drive bandwidth requirements downward. The network will be most inefficient while it is young, so we can expect bandwidth requirements to peak at some point, I’d estimate within the next two decades, and then begin heading down.

Directory services such as Napster will be instrumental in reducing demand for network bandwidth. Other keys to more effectively using bandwidth are compression, caching, and multicast, all of which are in their infancy. Many issues remain to be addressed, for example, server selection. Napster currently presents the user with a list of servers for each song, one of which is manually selected to download the song from. Developing automated techniques for server selection will be an important step forward in making this technology more seamless, and therefore more attractive for other applications.

Security deserves special mention, since distributing data across the net would seem to seriously compromise security, but this is probably not so. Encrypting the data allows it to be distributed even to insecure servers, which could serve the data, but couldn’t read it. Then, the centralized directory would provide a key that could be used to decrypt and read the data. Controlling access to the key would control access to the data. Typically block cipher keys are only a few dozen bytes long, so access to a 100KB file could be granted by a directory server in less than 1KB – a 100-to-1 savings in centralized bandwidth requirements. The authenticity of the data could be verified by X.509 certificates – placed in the directory, of course.

While Napster represents a real advance over older, more centralized, techniques, this doesn’t mean that the current protocol can’t be improved. Let me outline how I’d redesign Napster, if I were given the task:

  1. Use LDAP. The Lightweight Directory Access Protocol (LDAP) has become an accepted standard for directory service. Furthermore, a “pure” directory service, such as Napster, doesn’t require any special handling on the part of the directory server. All the server has to do is register directory entries, then fed them back out again in response to search requests. A standard LDAP server, such as OpenLDAP, could be used unmodified.
  2. Define and publish a standard schema. LDAP, and directory access protocols in general, use “schemas” to define the format of directory entries. In Napster’s case, a standard schema would probably include a “Song” class, defining artist, title, and year, and perhaps an “Album” class, listing all the tracks on a particular album. The “Song” class could then be extended (subclassed) into a “NetSong” class that would also include URLs where the song can be accessed. Using a standard, published schema would clearly define the directory structure, and make it easier to reuse the directory for new applications.
  3. Use HTTP or FTP. Just as there’s no need to create a custom directory service, there’s no need to invent new file transfer methods, either. Specifying a URL in the directory entry, using one of the standard methods, “http:” or “ftp:”, should suffice. Of course, most “clients” aren’t set up to be “servers”. In the present computing environment, Napster would be quite hard to configure if it relied on an external web or FTP server, and much more complex if it included an entire web server within it. The “peer-to-peer” paradigm ultimately implies that a machine can be simultaneously both a client and a server, and must be configured to act as both. This obviously contrasts with Microsoft’s policy of separate “client” and “server” operating system packages (the “server” usually being much more expensive), but free software hasn’t solved this problem completely, either. How exactly does an arbitrary software program go about registering itself with the local web server in order to share files?

Napster isn’t the first directory based system to be deployed on the Internet, but it is one of the newest and most exciting. If the government and economic leaders can be persuaded to surrender a measure of control, its decentralized nature may pave the way to a more distributed and more efficient network.