<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Soapbox</title>
	<atom:link href="http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=1" rel="self" type="application/rss+xml" />
	<link>http://www.freesoft.org/blogs/soapbox</link>
	<description>New ideas in computers, networking, and technology in general</description>
	<lastBuildDate>Wed, 22 Apr 2009 05:46:57 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
		<item>
		<title>Faster Sailboats</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=22</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=22#comments</comments>
		<pubDate>Tue, 21 Apr 2009 06:07:51 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Other Technology]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=22</guid>
		<description><![CDATA[I may never have been on a sailboat in my life, but I am fascinated by the physics behind their operation. They must operate simultaneously in two mediums, as both an airfoil and a hydrofoil. Plus, they are probably one of the &#8220;greenest&#8221; vehicles ever conceived. Which makes you wonder why we don&#8217;t use them [...]]]></description>
			<content:encoded><![CDATA[<p>I may never have been on a sailboat in my life, but I am fascinated by the physics behind their operation.  They must operate simultaneously in two mediums, as both an airfoil and a hydrofoil.  Plus, they are probably one of the &#8220;greenest&#8221; vehicles ever conceived.</p>
<p>Which makes you wonder why we don&#8217;t use them more.</p>
<p>Not only are they dependent on the weather, but they are also considerably slower than an airplane.  I&#8217;ve been thinking about how to make them faster, and my inspiration came from a book about airplanes &#8211; <em>The Deltoid Pumpkin Seed</em> by John McPhee.  This book documents, in popular language, the experiences of a group of entrepreneurs and engineers to build a hybrid airplane/airship (the Aereon) that would have a small engine and use helium to improve its lift characteristics.</p>
<p>Why not do the same thing with a sailboat?  Especially since most of the drag comes from the &#8220;wetted hull&#8221;, it would make sense to lift the hull out of the water as much as possible and leave only the keel submerged.  Ship designers have been doing this for years with cleaver hull designs intended to lift themselves out of the water as they get up to speed, but the Aereon design suggests another way &#8211; helium.</p>
<p>What seems to make sense to me would be to build a trimaran and fill the outriggers with helium.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=22</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cold Fusion</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=21</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=21#comments</comments>
		<pubDate>Wed, 08 Apr 2009 04:37:18 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Other Technology]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=21</guid>
		<description><![CDATA[A friend of mine recently bent my ear for an evening over cold fusion. It struck me as pseudo-science, but not wanted to be prejudiced, I spent some time with a web browser looking into it. Though I still have a hard time believing some of these claims, I have to admit that this technology [...]]]></description>
			<content:encoded><![CDATA[<p>A friend of mine recently bent my ear for an evening over cold fusion.  It struck me as pseudo-science, but not wanted to be prejudiced, I spent some time with a web browser looking into it.</p>
<p>Though I still have a hard time believing some of these claims, I have to admit that this technology shows promise.</p>
<p>The Coulomb barrier that must be overcome to fuse two protons is about 5 MeV &#8211; not something you&#8217;d expect at room temperature, but well within the range of a standard particle accelerator.  The problem is the minute cross section of the nucleus &#8211; protons with enough energy to fuse are far more likely to be scattered away from each other unless they are <em>precisely</em> aligned in a head-on collision.</p>
<p>That&#8217;s where the cold fusion claims start to get interesting.  All of the cold fusion reports that I read involved palladium as a catalyst.  Now palladium has the unusual property that it can absorb significant quantities of hydrogen.  There doesn&#8217;t seem to be a consensus on exactly how this works, but one explanation is that the hydrogen nuclei can move fairly freely within the palladium crystal mesh.  Now if I wanted to line something up at atomic dimensions, a crystal would be the obvious choice, and if I wanted to line something up while its moving, then I would want a crystal that allowed my particle mobility.  So palladium seems like an obvious choice to line up moving protons to precisely collide them.</p>
<p><span id="more-21"></span></p>
<p>The approach I&#8217;m thinking of right now is to use a particle accelerator to achieve the energies needed for fusion, since I still don&#8217;t see it happening with only thermal or low voltage electrical energy, but to fire the accelerated protons into a palladium crystal in order to achieve the alignment necessary for fusion.</p>
<p>We could model this theoretically using Schroedinger&#8217;s equation, but actually solving this equation seems very daunting.  A more promising approach would be to run some experiments to examine how proton beams interact with palladium crystals.  In the limit as the energy approaches zero, the crystal should just absorb the beam, since it absorbs elemental hydrogen, so we can expect some kind of interesting behavior.  I&#8217;d particularly hope to find some special angles and probably special energies at which a proton beam, fired into a palladium crystal, can travel for significant distances within the crystal with only mild dissipation.  Common sense now tells us that such a beam will eventually become aligned with the crystal, and if two such beams are fired into a large enough crystal from opposite ends, they will hopefully be aligned by the time they meet and the middle and we&#8217;ll stand a good chance of getting a nuclear fusion reaction.</p>
<p>Carrying this one step further, we&#8217;d like to minimize the thermal vibrations of that crystal in order to maximize its alignment, so I&#8217;d like to conduct these experiments not only at room temperature, but also at low temperatures, say liquid nitrogen and liquid helium temperatures.  It would be ironic indeed if the primary problem with cold fusion to date is that it isn&#8217;t cold enough.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=21</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dynamic addressing</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=20</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=20#comments</comments>
		<pubDate>Sun, 27 Jul 2008 06:20:48 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Networking]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=20</guid>
		<description><![CDATA[A theorem of Thorup and Zwick (Proposition 5.1 in 2001&#8242;s Approximate Distance Oracles) states that a routing function on a network with n nodes and m edges uses, on average, at least min(m,n^2) bits of storage if the &#8220;route stretch&#8221; (the ratio between actual path length and optimal path length) is less than 3 (i.e, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://citeseer.ist.psu.edu/thorup01approximate.html">A theorem of Thorup and Zwick (Proposition 5.1 in 2001&#8242;s <em>Approximate Distance Oracles</em>)</a> states that a routing function on a network with n nodes and m edges uses, on average, at least min(m,n^2) bits of storage if the &#8220;route stretch&#8221; (the ratio between actual path length and optimal path length) is less than 3 (i.e, if two nodes are two hops apart, the actual route taken between them must be less than six hops).  On the Internet topology, we can expect the n^2 term to dominate, so spreading these n^2 bits out among n nodes yields an average of n bits per node &#8211; i.e, each router&#8217;s routing table has to hold one bit for every device on the network.</p>
<p>Not a very encouraging result for those of us designing routing protocols.</p>
<p>Yet there is hope.  The result is only an <em>average</em>.  We can do better than the average if we allow our routing function to be skewed towards certain network topologies.  And it occurs to me that the Internet doesn&#8217;t change fast enough that we can&#8217;t skew our routing function towards the <em>current</em> network topology.</p>
<p>How can we do this?  With <strong>dynamic addressing</strong>: skewing our address summarization scheme to reflect the current network topology.</p>
<p><span id="more-20"></span></p>
<p>Of course, we&#8217;re already doing this, to an extent.  It&#8217;s what every IP network engineer does when he assigns summarization blocks; they correspond to groupings of devices within the network topology.</p>
<p>But I&#8217;d like to take this quite a few steps farther.  I&#8217;d like the summarization blocks themselves to be automatically allocated.  I&#8217;d like to be able to handle the case where a WiFi link connects devices from two different ISPs, or when a company wants to become multi-homed with two ISPs.  In both cases, we&#8217;re connecting devices together from radically different summarization blocks.  How can we handle this?</p>
<p>By dynamically readdressing the network.  When new links appear, exception routes are placed into routing tables.  Over time, the network readdresses itself so that those exceptions can now be summarized together.</p>
<p>Let&#8217;s take a simple example &#8211; a single autonomous system uniformly running OSPF.  When a new link appears, routers recompute their routing tables and then <em>auto-summarize</em> &#8211; automatically group addresses into routing table blocks.  Then another algorithm runs, which computes, for each router, the compete set of destinations which should be routed to each next hop.  The collection of all of these sets forms a <em>topology</em> on the destinations.  The objective is to now (re)assign addresses so that as many of those subsets as possible can be neatly summarized.  Then we reassign addresses to the network devices and gradually morph our addressing structure toward this new topology.</p>
<p>Obviously, there are performance issues here.  These algorithms can be quite time consuming if they are run on every route flap.  Finding a suitable address assignment might be an NP-complete problem.  How do you handle a link between two autonomous systems?  We probably want to consider possible future link failures, as well, when assigning those addresses.</p>
<p>But let&#8217;s leave all that stuff out for now.  It&#8217;s not too hard to see what changes we need to start making now to head in this direction, and it&#8217;s starting to sound like a routing loop on this blog &#8211; we need to shed our dependence on static IP addresses.</p>
<p>We need dynamic DNS.  Not like we&#8217;ve got it today, so heavily packet filtered that it&#8217;s almost useless, taken on as an afterthought, client-only dynamic DNS.  Dynamic DNS, coupled with DHCP, needs to be our default mode of operation.  A machine boots, be it client, server, or even router.  It DHCPs to get its IP addresses and then uses a cryptographically secured transaction to install them on its DNS name.  It has to be prepared for those addresses to timeout and be replaced with different ones, which will be migrated into its DNS information.  Protocols like TCP have to be prepared for mobility, in the sense that even a server (yikes!) may be changing IP addresses mid-session.</p>
<p>And yes, routers too have to use DHCP.  In fact, routers have to become not only DHCP servers, but they need to participate in a next generation DHCP that assigns entire blocks of addresses.  A router requests a block of x number of addresses for its subnet, and then hands those addresses out to its clients.  And has to be prepared for that entire block to timeout and be replaced by another.</p>
<p>While all that is being implemented (or not), we can start experimenting with protocols and algorithms to actually implement dynamic address allocation (or not).  And defeat Thorup and Zwick&#8217;s n^2 by constantly shifting the network to adjust.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=20</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Neuromancer closer than you think?</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=19</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=19#comments</comments>
		<pubDate>Tue, 22 Jul 2008 07:41:57 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Other Technology]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=19</guid>
		<description><![CDATA[Could contemporary technology be used to build a Gibson-esque implant that could connect a human brain to a computer? Maybe more practical, what would it take to curve spinal cord injuries like Christopher Reeve&#8217;s using this kind of technology? Basically, you&#8217;d need a computer chip attached to a human nerve. In an age of VLSI, [...]]]></description>
			<content:encoded><![CDATA[<p>Could contemporary technology be used to build a Gibson-esque implant that could connect a human brain to a computer?  Maybe more practical, what would it take to curve spinal cord injuries like Christopher Reeve&#8217;s using this kind of technology?</p>
<p><span id="more-19"></span></p>
<p>Basically, you&#8217;d need a computer chip attached to a human nerve.  In an age of VLSI, I doubt that building the chip would be that hard.  To tackle the spinal cord injury, for example, I&#8217;d imagine a chip built into a titanium vertebrae, and connected to a control unit much like a pacemaker.  Probably two chips built into two titanium vertebrae, one to attach to either side of the spinal cord break.</p>
<p>The surface of the chip would be an dense grid of electrodes, each capable of sending or receiving small electrical impulses.  A microprocessor would be responsible for mapping electrodes on one chip to electrodes on the other, thus insuring that electrical impulses were communicated across the break.  Building a chip with a dense enough grid doesn&#8217;t seem like anything that can&#8217;t be done using contemporary technology.</p>
<p>The hard part, I think, would be attaching the chip to the spinal cord.  I think an anisotropic conductive adhesive, probably an epoxy, should do the trick.  You&#8217;d need an conductive adhesive (obviously) to transmit the nerve impulses, but it couldn&#8217;t conduct in any direction.  Imagining the spinal cord connected vertically to the chip (in the x-y plane), you&#8217;d need an adhesive that would only conduct in the z-direction (that&#8217;s the <em>anisotropic</em> part).</p>
<p>Conductive adhesives are available, and in fact widely used in the electronics industry for connections that can&#8217;t be done with solder, either because of the heat required or because the materials (like silicon) won&#8217;t stick to solder.  They&#8217;re generally made by suspending tiny spheres of silver in the epoxy, which touch each other and conduct after the epoxy dries around them.  That&#8217;s an <em>isotropic</em> adhesive, which conducts in all directions.</p>
<p>That wouldn&#8217;t work for our purpose, because we&#8217;d need an anisotropic adhesive.  Such things exist, mostly in film form, where the film has been designed to conduct only along its narrow width.  I don&#8217;t think a film would do, either.</p>
<p>We&#8217;d need an anisotropic epoxy.  For starters, how would we set the direction of the anisotropism?  My first idea would be to use a magnetic field while the epoxy is setting, much like the way iron shavings line up in a magnetic field.  In fact, this might be almost exactly what is needed.  Instead of tiny spheres of metal, we need to figure a way to manufacture tiny slivers of metal, coated with an insulator except at their tips.  Maybe shape the tips like a ball and socket so they tend to link up in a chain instead of touching laterally.  Or maybe something else is needed, some kind of chemical conductor that would line up at the molecular level.</p>
<p>In any event, it seems like something worthwhile investigating.  I&#8217;m not a doctor, but the adhesive seems to me to be the main obstacle to this.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=19</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DNS-based identification</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=18</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=18#comments</comments>
		<pubDate>Tue, 10 Jun 2008 07:52:09 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Networking]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=18</guid>
		<description><![CDATA[I&#8217;ve written before about how the failure of source routing created the need for NAT, but that post didn&#8217;t address the basic security problem with source routing that caused ISPs to disable it. It allows a man-in-the-middle attack where a machine can totally fabricate a packet that claims to come from a trusted source. There&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve <a href="http://www.freesoft.org/blogs/soapbox/?p=8">written before</a> about how the failure of source routing created the need for NAT, but that post didn&#8217;t address the basic security problem with source routing that caused ISPs to disable it.  It allows a man-in-the-middle attack where a machine can totally fabricate a packet that claims to come from a trusted source.  There&#8217;s no way that the destination machine can distinguish between such a rogue packet and a genuine packet that the source actually requested be source routed through the fabricating machine.  At the time, Internet security was heavily host-based (think rsh), so this loophole became perceived as a fatal security flaw that led to source routing being derogated and abandoned.</p>
<p>A quarter century later, I think we can offer a more balanced perspective.  Host-based authentication in general is now viewed as suspect and has largely been abandoned in favor of cryptographic techniques, particularly public-key cryptosystems (think ssh) which didn&#8217;t exist when TCP/IP was first designed.  We are better able to offer answers to key questions concerning the separation of identity, address, and route.  In particular, we are far less willing (at least in principle) to confuse identity with address, if for no other reason than an improved toolset, and thus perhaps better able to judge source routing, not as a fundamental security loophole, but as a design feature that became exploitable only after we began using addresses to establish identity.</p>
<p>Can we &#8220;fix&#8221; source routing?  Perhaps, if we will completely abandon any pretext of address-based authentication.  What, then, should replace it?  I suggest that we already have our address-less identifiers, and they are <strong>DNS names</strong>.  Furthermore, we already have a basic scheme for attaching cryptographic keys to DNS names (DNSSEC), so can we put all this together and form a largely automated DNS-based authentication system?</p>
<p><span id="more-18"></span></p>
<p>The basic idea is that we authenticate using DNS names (not IP addresses), so it stands to reason that a basic operation has to be obtaining a connection&#8217;s remote DNS name in a secure manner.  Or perhaps I should say, obtain a connection&#8217;s remote DNS <em>names</em>, because it will become clear that we want a connection endpoint to have multiple ones.  How can we do this?  Several options suggest themselves:</p>
<ul>
<li>Modify the transport protocols to communicate the endpoint&#8217;s DNS names</li>
<li>Modify the IDENT protocol so that it can identify remote DNS names</li>
<li>Use IPSEC (somehow)</li>
</ul>
<p>Let&#8217;s assume for the moment that we&#8217;ve solved this problem.  The remote endpoint claims to be one or more DNS names and has communicated that information to us.  Now we need to authenticate it.  The obvious way  is a DNS lookup on the name to obtain a public key.  The remote endpoint now has to back up its claim by using the matching private key in some way.  We want the entire session to be authenticated, not just one little piece of it, so &#8220;some way&#8221; seems to imply <em>signing all the IP packets</em>.  We don&#8217;t want to use the private key from the DNS lookup to sign all the packets, because that would expose too much information about a fairly long-lived key, so probably something like using the key pair from the DNS lookup to establish some per-session keys and then using them to sign the packets would be more agreeable.</p>
<p>Sounds a lot like IPSEC.</p>
<p>The problem, as always, lies in key management.  IPSEC was designed largely to protect host-to-host communications, not application-to-application, and the structure of its key management reflects this.  IKEv2, probably the most widely used IPSEC key management protocol, operates using privileged daemons on reserved port numbers (500 and 4500).  This seems to preclude the possibility of having multiple distinct identities associated with different processes on the same machine.</p>
<p>So maybe simpler would be better.  Just cryptographically sign each packet using an AH header that matches a DNS-published public key.  Use DNS&#8217;s existing mechanisms to periodically time out that public key and replace it (maybe once an hour), thwarting attempts to take advantage of the exposure of the key to crack it.  The need for a transition period seems to dictate that multiple public keys can be associated with a single DNS name, and that signing with one of them is sufficient to establish identity.</p>
<p>Of course, using any kind of public key whose exchange hasn&#8217;t been protected by encryption exposes it, so it would seem that this simple scheme is about the best we can do without some kind of handshake exchange.  In particular, the simple scheme can naturally support multicast.  Also, the use of multiple public keys suggests the possibility of a transition that starts using a &#8220;low security&#8221; key as described in the previous paragraph, then transitions to a &#8220;high security&#8221; key that is used to authenticate and encrypt the exchange of a session key.</p>
<p>In short, while IPSEC&#8217;s basic packet headers seem quite suitable for this purpose, its key management scheme seems cumbersome and overblown.  I&#8217;ll use the simplified variant just described for the rest of this essay.</p>
<p>Returning now to the question of communicating DNS identity, it doesn&#8217;t matter (all that much) since any such communication isn&#8217;t trusted until it is verified.  Probably the most theoretically rigorous approach would be to specify the DNS name as part of the IP packet header, so your packet headers now contain the source location (IP address), source identifier (DNS name), and authentication (AH header).  Other, less elegant but more palatable approaches (from an efficiency standpoint), would be to use an enhanced IDENT protocol, or an extra TCP header during TCP initialization, or some other out-of-band mechanism.</p>
<p>In short, our identifiers are DNS names.  We have signed public keys in DNS.  Hosts or applications possessing the matching private keys are able to act as the entity named by the matching DNS name.  The application uses some kind of API to install a private key that will be used to sign all of its packets.  It also announces its DNS name (in IP header options, or TCP header options, or via an IDENT daemon).  All of this is enough to allow the recipient to validate the identity of its remote peer, independent of its IP address.</p>
<p>Package all of this up into a library, and authentication takes a relatively simple form &#8211; applications need only to check the remote DNS name against a list of valid names.  Regular expressions seem a natural choice.  The kind of DNS wildcard syntax used by NFS exports files or X Windows xauth now look a lot more attractive as a generic authentication framework.</p>
<p>Since DNS is hierarchal, fine-grained authentication seems to fit naturally into this model.  For example, I don&#8217;t want my login sessions authenticated on a per-host basis, but rather on a per-user basis.  This can be accomodated simply by adding DNS names for individual users.  Rather than having a single DNS name for a single host, a host can manage an entire zone underneath its name.  Each machine can run its own DNS server and manage its own DNS space, so &#8220;bigsky.freesoft.org&#8221; can create a DNS name &#8220;baccala.bigsky.freesoft.org&#8221; for one of its users.  Attach a public key to that DNS name, and remote applications can specify &#8220;baccala.bigsky.freesoft.org&#8221; in their authentication list to allow access to this user account.  Since DNS is not tied specifically to any one naming scheme, organizations have a great deal of flexibility in managing their zones.  Specifically, the same authentication scheme, unmodified, could also be used to create a single domain-wide user account &#8220;baccala.freesoft.org&#8221;, or &#8220;baccala.users.freesoft.org&#8221;, and it would all be transparent to remote applications, so long as local applications possessed the correct matching private keys.</p>
<p>Applications would naturally possess multiple identities &#8211; one for the user account they run under, one for the machine they run on, maybe other for user groups or workstation clusters.  How does an application know which one to use?  This could be negotiated using some kind of key exchange protocol, but that wouldn&#8217;t be realistic in a multicast environment.  So probably the best thing is to allow multiple AH headers to stamp multiple identities on a single packet.  Then, it&#8217;s up to the application to decide which set of identities to use for a particular transaction, subject always of course to the proviso that it must possess the correct set of private keys.  And this implies that a remote entity may have multiple identities, any one of which, presumably, can be used for an authentication process.</p>
<p>More can be done.  If we want to encrypt without pre-shared keys, a similar scheme can be used &#8211; obtain a public key from DNS and encrypt packets so that only the matching private key can decrypt.  Again, when you think about a multicast environment, it starts to look a lot more attractive that running a key exchange protocol between every pair of hosts.</p>
<p>But I&#8217;ll stop here.  My point is this: we&#8217;re now able to assemble an authentication scheme based on cryptographically secure DNS identifiers, and we&#8217;ve have enough problems with address-based authentication that we should.  If we finally ditch this idea that you can convert an IP address into a DNS name, and instead insist that that translation only really works in the other direction, we can overcome the <em>real</em> design flaw that doomed source routing.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=18</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Corrupt Education</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=17</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=17#comments</comments>
		<pubDate>Thu, 29 May 2008 06:36:09 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Tirades]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=17</guid>
		<description><![CDATA[I recently toyed with going back to school for graduate study in mathematics, going so far as to apply to a university. I won&#8217;t parade all the details, but I think it was a positive experience. I reached an epiphany, a conclusion that I&#8217;ve been resisting for years, but have finally accepted: A university is [...]]]></description>
			<content:encoded><![CDATA[<p>I recently toyed with going back to school for graduate study in mathematics, going so far as to apply to a university.  I won&#8217;t parade all the details, but I think it was a positive experience.  I reached an epiphany, a conclusion that I&#8217;ve been resisting for years, but have finally accepted:</p>
<p><em>A university is a corrupt institution.</em></p>
<p><span id="more-17"></span></p>
<p>First, universities are <em>successful</em>.  I have nothing against success <em>per se</em>, but what it takes to succeed in this society is some kind of accommodation with capitalism; you can&#8217;t really make it in this world otherwise.  A restaurant is a capitalist organization; a soup kitchen is not, and therein you find the greatest distinction between them.  Both feed people.  One refuses its food to anyone who can not or will not pay; the other offers its food free of charge.  One occupies prime real estate and sports swank decor; the other is relegated to the cheapest digs and plainest furnishings.  One turns a tidy profit for its owners; the other is dependant on hand outs.  One is <em>moral</em>; the other is not.</p>
<p>Likewise with schools.  Walk onto a university campus and you will be immediately impressed by the gleaming architecture, the broad walkways, the teeming student life.  Yet at what cost?  $100 textbooks, multi-thousand-dollar tuitions, credit-card restricted access to journal articles.  A university, like a restaurant, has reached an accommodation with capitalism, and it is this: they are <em>selling education</em>.</p>
<p>As obvious as this may sound, it has taken me years to really accept this.  I turned first against our laws, then against our economic system, then against our leaders, finally against our system of government, but still I clung to our school system.  I applied as a teacher after I had reached a point where I would work for almost no one else.  Why?  Because they are <em>educators</em>, I told myself, remember the sanctity of knowledge.</p>
<p>They&#8217;re educators the way Bill Gates is a programmer.</p>
<p>Don&#8217;t get me wrong &#8211; I have friends who are university professors, just like I have friends who are restauranteurs, who are businessmen, who are soldiers.  If I held everyone I knew to some moral gold standard, there&#8217;d be no one left, including myself.  Yet, by and large, we can judge large institutions by the values they promote, and a core value of modern higher education is this:</p>
<p><em>Education is something to be packaged and sold.</em></p>
<p>Where does the money come from?  Anytime you see success in this society, you have to ask: <em>where does the money come from?</em>  It&#8217;s not a charity, because if so it would look like a homeless shelter, not a 5-star hotel.  And the answer is simple: <em>it comes from the students</em>.  Those few with the absolute best grades (or the greatest skill pitching a baseball) get scholarships, the rest pay.  If they can&#8217;t pay now, then there&#8217;s always a nice student loan package so they can pay later.  After all, what are the students there for?  To learn?  That&#8217;s quaint, but in the real world they&#8217;re there to improve their future job prospects.  So the students will use their degrees to make more money, some of that money goes to paying off their loans, the university gets to build its new library, everybody is happy.  Right?</p>
<p>A friend of mine has a daughter in her first year of college.  After a semester of bad grades, he refused to pay for another, so she took out a student loan.  I have to wonder &#8211; does an 18-year-old really understand how long you have to work for somebody else to pay back $8,000?  I know I didn&#8217;t.  When I think back on my undergraduate years, the first thing that comes to mind is <em>huge waste of my parent&#8217;s money</em>.</p>
<p>Let me qualify that, lest you be left with the notion that I advocate a throw-the-kids-out-on-their-own approach to funding college tuition.  I had a good time at college and I got a solid undergraduate education in physics.  That&#8217;s what my parents were paying for, right?  Never mind that the library card would have been a lot cheaper than the tuition, because my parents would never have paid for my room and board if it came with nothing more than a library card.  After I left college, I then got the full dose of throw-the-kids-out-on-their-own.  After a while, I contemplated suicide.  Maybe I wouldn&#8217;t be here now if my parents hadn&#8217;t paid for that extra five years of growing-up time.</p>
<p>But this isn&#8217;t about the kids.  This about the schools and the teachers.  How can they push multi-thousand dollar loan forms in front of teenagers and tell them to sign?  How can they outlaw on-line public libraries so that Springer Verlag can cut them in on fat publishing deals?  How dare they act like anyone without some paper on the wall is incompetent?</p>
<p>Coming back to my graduate school application, I was really in a moral dilemma.  I wasn&#8217;t going to pay, but would I accept a T.A.&#8217;s position if it was offered?  Wouldn&#8217;t I be then part of the thing that I despise?  At the lowest rung, of course, but still, I&#8217;d be taking money from these kids to teach, right?  I met my rejection with a number of emotions, but one of them was definitely relief.  Like I said, it was a positive experience.</p>
<p>I&#8217;ve had it.  I won&#8217;t pay, and I won&#8217;t participate.  I won&#8217;t let my education become corrupt.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=17</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>iDictionary</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=16</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=16#comments</comments>
		<pubDate>Sat, 13 Jan 2007 05:15:08 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Software]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=16</guid>
		<description><![CDATA[Apple&#8217;s recent announcement of the iPhone has inspired me to reconsider how IT can be used to support foreign language studies. According to Apple, the iPhone will have a microphone (it&#8217;s a phone, after all), run OS X, and have 4 to 8 GB of memory. That should be a sufficient platform to load a [...]]]></description>
			<content:encoded><![CDATA[<p>Apple&#8217;s recent announcement of the iPhone has inspired me to reconsider how IT can be used to support foreign language studies.  According to Apple, the iPhone will have a microphone (it&#8217;s a phone, after all), run OS X, and have 4 to 8 GB of memory.  That should be a sufficient platform to load a voice activated dictionary.  After training a voice recognizer, you could speak a word into the device which it would then lookup in a dictionary and display the dictionary entry on the screen, providing language students with the detail of a full sized dictionary in something that could fit in their pocket.</p>
<p>Could pocket Spanish-English dictionaries be a thing of the past?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=16</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Dynamic DVD</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=15</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=15#comments</comments>
		<pubDate>Fri, 12 Jan 2007 05:40:19 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Software]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/soapbox/?p=15</guid>
		<description><![CDATA[As streaming video has become more commonly available, it is now plausible to discuss offering a video interface to a website. A user could connect to a site using a video client and &#8220;navigate&#8221; on-line video content using a DVD-style user interface &#8211; video menus, highlighted on-screen buttons, fast-forward, subtitles. Alternately, such an interface could [...]]]></description>
			<content:encoded><![CDATA[<p>As streaming video has become more commonly available, it is now plausible to discuss offering a video interface to a website.  A user could connect to a site using a video client and &#8220;navigate&#8221; on-line video content using a DVD-style user interface &#8211; video menus, highlighted on-screen buttons, fast-forward, subtitles.  Alternately, such an interface could augment conventional TV broadcasts, offering a nightly news program with video hyperlinks to hours of detailed coverage of events we currently see only in sound bites.</p>
<p><span id="more-15"></span></p>
<p>The prospect is tantalizingly close.  Anyone with a megabit Internet connection (cable or DSL) and a reasonably modern computer has a client system that can support this.  Yet no Internet video clients (to my knowledge) can provide DVD-style interaction.  And free software video production tools, while available, lag somewhat behind.  Enough groundwork has probably been laid, though, to allow an interactive video client to be built.</p>
<p>Just as fifteen years ago the network provided FTP sites where users could download information piecemeal, so is the Internet&#8217;s current video offerings &#8211; a little bit here, a little bit there.  Just as the World Wide Web melded the graphical user interface with a global data network, we can now contemplate melding the DVD user interface with the Internet to provide hyperlinked on-demand video.</p>
<p>What kind of new content could this support?</p>
<p>Consider, first, news service.  Currently, we have traditional national and local news broadcasts, typically a half hour each in length, along with 24 hour cable news channels that basically repeat the same information over and over again, on roughly a half hour or hourly cycle.  A news broadcast delivered using dynamic DVD, on the other hand, could be many hours in &#8220;length&#8221;, accessible via hyperlinks from a traditional half-hour summary.  A report on the day&#8217;s political events in Washington could linked to full recordings of various speeches.  Excepts from an interview could be linked to the full interview itself.  A 30-second summary spot could be linked to an entire half-hour report on just that one subject.  Weather reports could be hyperlinked together so that travelers could pick a city and see the local report for their destination.</p>
<p>Sporting events could be delivered using DVD&#8217;s largely overlooked &#8216;angle&#8217; feature, which allows viewers to pick from one of up to nine different simultaneous video feeds, allowing viewer selection different camera angles on the event.  Highlight footage to link into a recorded history of the entire game, and several different highlights could be made available for each game</p>
<p>And, of course, conventional DVDs could be delivered in this format.</p>
<p>Could the Internet actually delivery the performance needed to make such a system widely deployable?  Unfortunately, the current answer is probably &#8220;no&#8221;.  As I have discussed on this blog, <a href="/blogs/soapbox/?p=13">caching</a> and <a href="/blogs/soapbox/?p=9">multicast</a> are two critical technologies that the contemporary Internet is unable to support for fundamental technical reasons.  Without caching and multicast, a video web could only work the way the World Wide Web works &#8211; by requiring each server to transmit the same data over and over again.  With text and graphics, this is a nuisance. With video, it will probably be a show stopper.</p>
<p>A stopgap solution might be available by providing digital multicast feeds via cable TV.  This would be a hybrid of conventional cable video broadcasts and cable Internet service.  Using the same basic technology of cable Internet and 10-BROAD-36, a cable provider could broadcast one or more transmit-only digital channels (similar to HDTV).  The difference would be that the digital signal would be in some kind of yet unspecified dynamic DVD format that would contain navigation information and be cached onto a set-top box.</p>
<p>What would be required to view dynamic DVD?  Certainly a computer would work, but you could probably just build a DVD player with a built-in hard drive.  Of course, a DVD player basically is a computer, so maybe the distinction isn&#8217;t that great.  Instead of a computer that comes with a keyboard and a monitor, you could just build a computer with a standard TV video output and remote control input.  In fact, I think I would favor this approach just for reasons of standardization.</p>
<p>So, using either the Internet or a enhanced digital cable TV feed, we could build either a Video Web, or at least provide a far superior NBC Nightly News.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=15</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Building chess tablebases over the Internet</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=14</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=14#comments</comments>
		<pubDate>Wed, 23 Aug 2006 07:19:17 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Software]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/newnet/?p=14</guid>
		<description><![CDATA[I&#8217;ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames. Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board. I starting thinking about this, but [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames.  Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board.</p>
<p>I starting thinking about this, but more along the lines of building the tables dynamically, using a pool of cooperating computers on the Internet.  The idea would be to direct your search towards certain game positions that you wanted to analyze.  This would work well in static analysis and also in relatively slow correspondence time controls (like Gary Kasparov vs The World, or the upcoming The World vs Arno Nickel).</p>
<p><span id="more-14"></span></p>
<p>A big key here would be to segment your search space so that individual computers can calculate part of it independently and then return the results to a server.  I think we can do this with chess because pawns can&#8217;t move backwards and captures can&#8217;t be reversed, either.  So we can ask each computer to analyze up to a capture or pawn movement.  We treat the pawn positions as static, and all the other pieces (the &#8216;mobile&#8217; pieces) move around them.  As soon as a pawn moves or a capture occurs, we&#8217;re into a different table.  So long as, say, all but four of the pieces are pawns, these files are of reasonable size (a few MB) and can be computed in a reasonable time.</p>
<p><H3>Pruning</H3></p>
<p>We prune if we don&#8217;t want to fully analyze out the tree past the table we&#8217;re now building.  Of course, this will affect the accuracy of the table; the table is a result of BOTH the position it was set up for AND the pruning decisions (and any pruning decisions made on the future tables used to calculate this one).</p>
<p>We specify pruning in a simple way &#8211; by omitting future tables for moves we don&#8217;t want to consider.  This can be dangerous, so we require this feature to be specifically enabled for each move by a command-line switch.  Actually, we use two switches, one to calculate a table for OUR SIDE to move, and another if it is the OTHER SIDE to move.</p>
<p>So, <tt>--prune-our-move e3e4</tt> prunes a pawn move (assuming this is a table with a static pawn on e3) by simply ignoring e3e4 as a possible move.</p>
<p>Pruning an opponent&#8217;s move is more complex because we step a half-move into the future and consider our own next move.  This costs us little, since we can control our own move and therefore don&#8217;t have to consider all possibilities, and improves a lot.  If future tables exist for any of our responses, they are used.  If no such future tables exist, then the move is regarded as a lost game.</p>
<p>So, <tt>--prune-his-move e7e8</tt> prunes a pawn promotion (assuming a static pawn on e7) by considering all possible positions resulting after the pawn promotion (to either Q or N) AND the answering move.  The resulting game is regarded as a win for white unless both Q and N promotions have an answer that leads to another table with a win or draw for black.</p>
<p>For example, let&#8217;s say we&#8217;re looking at a Q-and-P vs. Q-and-P endgame.  There are four mobile pieces (2 Ks and 2 Qs), so we can handle this.  But if one of the pawns queens, then we&#8217;ve got a game with five mobile pieces, and that&#8217;s too complex. But we don&#8217;t want to completely discard all possible enemy promotions, if we can immediately capture the new queen (or the old one).  So we specify something like <tt>--prune-his-move e7e8</tt> and pass in a tablebase for a Q-and-P vs. Q endgame.</p>
<p>We also check for immediate checkmates or stalemates.</p>
<p>Pruning doesn&#8217;t affect the size of the resulting tablebase.  We discard the extra information.  If the pruned move is actually made in the game, then you have to calculate all possible next moves and check your tablebases for them.  This seems reasonable.</p>
<p>In short, intelligent pruning is essential to making this work.  Let&#8217;s say we get into a rook and 3 pawn vs. bishop and 4 pawn endgame (or something like that).  We can&#8217;t just solve the whole thing.  If all the pawns queened, we&#8217;d get 7 queens on the board and that would be impossible.  So we have to prune out these highly unlikely combinations.  We can also prune our own moves aggressively, because we don&#8217;t have to make them!  Pruning makes this whole idea feasible.</p>
<p><H3>Program Design</H3></p>
<p>Let&#8217;s keep the program simple.  It&#8217;s designed to be run on a whole bunch of machines over the Internet, each one calculating a different table, that all then get linked together. Each table goes into a different file. The server assigns a program the task of computing a certain file. The files needed for this task are downloaded.  Then the program runs locally, just calculating the requested table/file.  When done, the file is uploaded back to the server.</p>
<p>File naming syntax: <tt>wPIECES-bPIECES.tbl</tt></p>
<p>PIECES is a list of &#8216;mobile&#8217; pieces (K, Q, R, B, N) followed by pawn positions in algebraic notation.</p>
<p><tt>wKQ-bKQ.tbl</tt> is a (K+Q) vs. (K+Q) endgame</p>
<p><TT>wKQe6-bKQ.tbl</tt> is a (K+Q+pawn-on-e6) vs. (K+Q) endgame</p>
<p>Certain symmetry relations exist.  We don't need both <tt>wKQ-bK.tbl</tt> and <tt>wK-bKQ.tbl</tt>; just one will suffice.  <tt>wKQe6-bKQ.tbl</tt> is basically the same as <tt>wKQd6-bKQ.tbl</tt> or <tt>wKQ-bKQe3.tbl</tt></p>
<p>Need an algorithm to determine symmetry; maybe white should always have numerically superior side and pawns should always skew towards lowest letters and highest numbers.</p>
<p>We need some tables to compute others.  For example, to compute <tt>wKQd6-bKQ.tbl</tt>, we need <tt>wKQd7-bKQ.tbl</tt> (possible pawn advance), <tt>wKd6-bKQ.tbl</tt> (possible x of white queen by black king or queen), <tt>wKQd6-bK.tbl</tt> (possible x of black queen by white king or queen), <tt>wKQd7-bK.tbl</tt> (possible x of black queen by d6xe7) and <tt>wKQc7-bK.tbl</tt> (possible x of black queen by d6xc7).</p>
<p>Some of these can be pruned.  If we're looking for a forced win by white, we can omit the rarer tables and treat any move into them as a forced win for black.  If we can still force a win for white, we know that it can be done without moving into any missing table.  This might result in some obviously won positions being reported as undecided.</p>
<p>File format is designed to be mmap'ed into the program's memory space; it can be compressed with gzip for transport.</p>
<p>File looks like an array [64][64]...[64][2].  There are as many [64]'s as mobile pieces, and correspond to them in the order listed in the filename, so the first [64] is always white's king.  The [2] determines whose move it is - white to play (0) or black to play (1).</p>
<p>Each entry in the array is a four-byte structure:</p>
<pre>
struct {
	unsigned char	movecnt;
	unsigned char	white_wins_moves;
	unsigned char	black_wins_moves;
	unsigned char	draw_moves;
}
</pre>
<p>We init by running through the array, flagging illegal positions (all one bits, maybe), flagging positions that are checkmates for one side or the other as won but 'unpropagated' (how?), flagging stalemates as drawn but 'unpropagated' (how?) and otherwise computing the number of moves away from this position, storing that number in movecnt and setting the other fields to zero.</p>
<p>Now we run through the table, looking for 'unpropagated' positions and propagating them.  This means backtracking one half move.  If the position is black to play and is won for white (or vice versa), then we back up to every position that could have gotten here by one move (white's) and flag that position won for white and 'unpropagated'.  If the position is white to play and is won for white (or vice versa), then we back up to every position that could get here by one move (black's), decrement movecnt by one and increment its white_wins_moves by one.  If its movecnt is now zero, we evaluate it.  It won't have any black_wins_moves (because it would have been flagged immediately won for black).  So its moves are all either draws or wins for white. If there are no draws (only forced wins for white), we flag it as won for white and 'unpropagated'.  Otherwise, we flag it as drawn and 'unpropagated'.  If we're back propagating a draw, we'll be incrementing draw_moves.</p>
<p>And I skipped a step.  The first step after initialization is to run through all the tables this one depends on, and back propagate from those positions (all marked either won for one side or drawn) to the current table.</p>
<p>Now we just keep running through the table, finding entries marked 'unpropagated', propagating them and marking them 'propagated'.</p>
<p>Finally, everything left should be the positions that will be drawn by repetition with best play from both sides, and should be flagged as such.</p>
<p>It would seem that this algorithm will avoid draws by repetition because it will always make forward progress if possible, except in very long winded positions in something like a three knights endgame.</p>
<p>Finally, we'd like the positions to be flagged not only won, lost, or drawn, but also with the number of moves required for a win.  This seems to require a modification to the basic algorithm, since we can't just flag a move as won, but maybe not.  If we process all the won positions first, then everything flagged will be a mate in one.  Then we process all the mates in one; everything flagged will be a mate in two, etc.</p>
<p>So, we really don't need the four-byte structure.  We don't need a 'mover_wins_moves' field, because if we find even a single move where the mover wins, we're done with this position.  We need 'movecnt', 'opponent_wins_moves', and 'draw_moves'.  We could lose one of these if we re-calculate the number of moves available in this position.  I think the quickest way to do this is to scan the table for all possibles movements of the mobile pieces and see if the positions are marked invalid or not.  Looks like a space-vs-speed tradeoff.</p>
<p>So, we could get down to a two-byte structure.</p>
<pre>
struct {
	unsigned char	movecnt;
	unsigned char	opponent_wins_moves;
}
</pre>
<p>movecnt's high-order bit could be reserved for the 'propagated' flag, which leaves room for 127 possible moves.  This seems OK.  3Q+K would have at most 92 moves available to them.</p>
<p>Once 'propagated' is set, the meaning of the structure changes.  Now we have two more bits indicating:</p>
<pre>
	00 - draw
	01 - win for white
	10 - win for black
	11 - invalid move
</pre>
<p>The remaining 13 bits can store a 'mate in' count, up to 8192!  We should add another flag bit to indicate if the 'mate in' count is total moves or just to get to the next table.</p>
<p>Probably best to keep our program flexible by allowing for either four-byte or two-byte format to be used.  Maybe different file extensions, or leave room for a header at the beginning of the file to indicate which format we use.  Let's use a 256 byte header, put a version/format field in the first four bytes (ascii) and leave the remaining 252 bytes (for now) to double-check the file name, indicate that processing has been completed and we don't have corrupted data, text string to describe how the file was built, indicate which file dependencies were satisfied by actual files (and maybe their MD5 checksums) and which were pruned as forced wins or draws, and an MD5 of the rest of the file.</p>
<p>Have to see how gzip does, or might want a more compressed format for transport.</p>
<p>Could dump the move count and get a 2-bit-per-position format, but that's dangerous if we don't store the moves themselves, because then we would be moving in circles.  This would only be good for transport to a 'slave' program computing another table, but we still wouldn't have exact move counts (couldn't tell if we should head for one move or another first in the next table).</p>
<pre>
.tbl for 'full' (maybe 2-byte) format
.tbc for 'compact' (2-bit) format - 1/8 size of full format
</pre>
<p>Compact format also has the disadvantage of not being able to convince people with a full move sequence, but this might not be a factor for any but the strangest cases (because we can usually figure out the move sequence from the compact format)</p>
<p>What tablebases to build?</p>
<p>all possible three- and four-piece tables are fundamental for more complex ones</p>
<p>an uncompressed two-byte four-piece table will be 64MB, so a complete set of all 30 of four-piece-ers will take up about 2GB (uncompressed)</p>
<pre>
wKQ-bK
wKR-bK
wKB-bK (building this and next one both checks program and avoids draws)
wKN-bK
wKP-bK (put this in its own table rather than 64 8K or 16K files?)
wKQ-bKQ
wKQ-bKR
wKQ-bKB
wKQ-bKN
wKQ-bKP (ditto for all these four-piece tables with pawns)
wKR-bKR
wKR-bKB
wKR-bKN
wKR-bKP
wKB-bKB (should handle both same color and different color bishops OK)
wKB-bKN
wKB-bKP
wKN-bKN
wKN-bKP
wKP-bKP
wKQQ-bK (some of these seem silly)
wKQR-bK
wKQB-bK
wKQN-bK
wKQP-bK
wKRR-bK
wKRB-bK
wKRN-bK
wKRP-bK
wKBB-bK (but some are important, for sure!)
wKBN-bK
wKBP-bK
wKNN-bK
wKNP-bK
wKPP-bK
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=14</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Java Caches</title>
		<link>http://www.freesoft.org/blogs/soapbox/?p=13</link>
		<comments>http://www.freesoft.org/blogs/soapbox/?p=13#comments</comments>
		<pubDate>Wed, 23 Aug 2006 07:10:17 +0000</pubDate>
		<dc:creator>cosine</dc:creator>
				<category><![CDATA[Computer Networking]]></category>

		<guid isPermaLink="false">http://www.freesoft.org/blogs/newnet/?p=13</guid>
		<description><![CDATA[I&#8217;ve been thinking for several years about the flaws in the Internet&#8217;s (nearly non-existant) caching model, and have reached several conclusions. First, caching policy is very difficult, basically impossible, to specify in some arbitrary protocol. This is one of the biggest problems we&#8217;ve got with caching &#8211; the cache manager has a lot of settings [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking for several years about the flaws in the Internet&#8217;s (nearly non-existant) caching model, and have reached several conclusions.  First, caching policy is very difficult, basically impossible, to specify in some arbitrary protocol.  This is one of the biggest problems we&#8217;ve got with caching &#8211; the cache manager has a lot of settings he can adjust, but the data provider has almost none &#8211; basically cache or don&#8217;t cache (oh yeah, he can specify a timeout, too).  So, I&#8217;m led to conclude that data providers need a very flexible way to inform caches of their data&#8217;s caching policy, like a remote executable format, i.e. Java.  My second conclusion is that what limited caching we&#8217;ve got is destroyed when people want to provide dynamic content.  The only way I can see to cache dynamic content is to cache not the data, but the program used to create the data, and expect the client to run the program in order to display the data.  And we don&#8217;t want to do that on the caches (if we can help it) for performance reasons.  Again, a remote executable format, this time on the client, i.e. Java.  My third major conclusion is that caching is a multicast operation and requires multicast support to be done, but that&#8217;s for a diferent paper.  Thus, I&#8217;m proposing an integrated Java-based caching architecture using what I call &#8220;cachelets&#8221; on the caches to provide a flexible and usable caching architecture.</p>
<p><span id="more-13"></span></p>
<p>Like an &#8220;applet&#8221; running on a client or a &#8220;servlet&#8221; running on a server, a &#8220;cachelet&#8221; runs on a cache.  Data items have cachelets (to control their caching) associated with them.  When a cache considers caching a data item, it downloads the cachelet, which then manages the cached data associated with it.  Multiple cachelets can be running on the same cache, each managing a different set of data.  Cachelets are untrusted, running in a Java VM on a foreign host.</p>
<p>Initially, when data is put into a system on a server, a cachelet is provided to manage it.  So there&#8217;s always at least one copy of the data&#8217;s cachelet &#8211; the copy managing the original data on the origin server.  Such a cachelet would be &#8216;nailed up&#8217; &#8211; it would never get purged from its &#8220;cache&#8221;.  Also, its &#8220;cached data&#8221; would take the form of files provided by the author containing the data itself.  Ideally, we&#8217;d like this cachelet to be identical to all the others (i.e, no need for a special servlet), so the cachelet API should be built so that a file in the underlying filesystem can be treated as cached data.</p>
<p>Other caches might want a copy, too.  The basic idea is that when a cachelet is going to transmit data it multicasts a notification to any interested caches.  So how does a cache get &#8220;interested&#8221;?  Well, the number one way is that it is located in physical (in a network sense) proximity to the path the data is going to take through the network. With this in mind, I can think of two possible models.  The first uses a directed, scoped multicast, that could be implemented using a combination of Lightweight Multicast (LWM) and traditional multicast. When a transfer is about to begin, the source transmits a directed multicast to the destination with an annoucement of this.  This can be formulated as a packet addressed to an &#8220;all caches&#8221; multicast address, along with an LWM header option directed at the unicast destination. Caches can then subscribe to this multicast address within some kind of radius or domain.  As this a directed multicast passes along the source-to-destination path, any caches along the way subscribed to the &#8220;all caches&#8221; address will receive a copy of the notification.</p>
<p>Of course, this presupposes a capability (directed multicast) that doesn&#8217;t exist (yet).  How can we achieve cache rendezvous without working multicast?  It&#8217;s a hack, but we can expect data requests to be sent with the IP Record Route option, so the transmitting cachelet will have at least an &#8220;advisory route&#8221; about the path the data will be taking, which can be supplemented with local routing info and something like traceroute.  Armed with this information, the transmitting cachelet does a series of DNS lookups, in in-addr.arpa (or part of the DNS tree structured like in-addr.arpa), to find caches that have registered (using dynamic DNS) their interest in a particular CIDR block.  Now we have addresses of those caches that have expressed interest in any of the addresses along the path to the destination.  We transmit a notification of the upcoming transfer to those caches, giving them the opportunity to subscribe to the (multicast) transfer, and proceed to transmit.</p>
<p>Having received a notification, a cache has to decide whether to cache the data.  The first step is to download the cachelet.  Presumably, this will almost always happen.  Why?  Because the cachelets will be designed to be flexible, so a single cachelet can manage many similar data feeds.  That means you really only need a small number of cachelets in your entire network.  The multicast announcement of the upcoming data transfer can include the URL of the cachelet.  So once the cache&#8217;s got the cachelet, it has to&#8230;</p>
<ol>
<li> initialize the cachelet
<p>   an ISP might want to influence the caching decisions of its subscribers</p>
<p>   the simplest way to do this is for the ISP to provide a cachelet    to run on its subscriber&#8217;s caches</p>
<p>   thus, we&#8217;ll need some kind of inter-cachelet communications standard    to allow the ISP-provide cachelet to influence its fellows</p>
<p>   at the very least, upon initializing, the cachelet should multicast    to an all-local-cachelets address, announcing its presence to    other cachelets on the same cache</p>
<p>   no, our current multicast scheme can&#8217;t really handle this.  Basically,    we need multicast ports instead of multicast host addresses.  We can    punt on this and let it find out about other local cachelets because    they&#8217;ve registered themselves (and their port numbers) in DNS</p>
<p>   it may also want to subscribe to multicast channels, or register itself    using dynamic DNS</p>
<p>   consider cryptographic authentication of the cachelet</p>
<p>   might want to have it reviewed by a central authority</p>
<li> considering a new cache item
<p>   a notification has been received of a current or near-future transfer</p>
<p>   do we want to cache it?</p>
<p>   we know: what it is (URL), where it will be (advisory route),      and when it will be there (some kind of timestamp)</p>
<p>   cachelet may need to do some network chit-chat (i.e, download some    information specific to this particular data item) before it    can fully advise the host</p>
<p>   cachelet finally does advise the host: how badly do we want to      cache this item (on a 0-255 scale, perhaps), how much space      will probably be required to cache this item</p>
<p>   how does the host compare advise from different cachelets?</p>
<p>   it can measure how much data is being transfered how often from    which cache items and use this as a baseline to compare advise</p>
<li> Cacheing a data item
<p>The transfer itself, ideally, should be multicast.  We&#8217;re transmitting to, perhaps, a client as well as several caches.  This is inherently multicast, but again, we can&#8217;t expect this to work in the current Internet.  Probably we&#8217;ll transmit to the (closest?) cache that wants a copy, and it will have to turn around and relay the data to the client or other caches.  If the client requested the data from the origin server using TCP, we&#8217;re got a real problem here.  If we&#8217;re using HTTP, I guess the origin server can redirect the client to the cache (using &#8220;305 Use Proxy&#8221;, but this is for a single request only), but what we really want is something like TCP-over-multicast, some kind of reliable stream service which can deliver to multiple destinations &#8211; both the client and the cache.  Again, beyond the capability of the contemporary Internet.  If we redirect to the cache, there&#8217;s a denial of service issue here.</p>
<p>We really need to rethink HTTP caching, probably with an eye towards using a local Java cache capable of caching, or at least matching, a entire block of URLs, possibily using regular expressions and a sub-cachelet scheme.</p>
<li> serving a data item
<p>   cachelet gets a request from a data transfer</p>
<p>   the request goes to _the_cachelet_, not the cache</p>
<p>   the cachelet should ask the cache host if it&#8217;s OK to serve    the data now (based on policy and things like host/network load)      needs to be part of the cache/cachelet API</p>
<p>   cachelet might want to know general policy from the cache, so it      can decide if it wants to advertise the ability to serve data at all</p>
<p>   cachelet might chit-chat over the network, then uses its high-speed    API to begin the transfer</p>
<p>   the cache host itself does almost nothing other than assist the transfer</p>
<li> cachelet wants to preemptively cache, i.e, initiate its    own transfer
<p>   cachelet advises the host: how badly we want to cache, how much space</p>
<p>   might want to do multiple advisories &#8211; we want to cache X1 MB at    priority Y1, and an additional X2 MB at (lower) priority Y2</p>
<li> cachelet wants to change its advice on previously cached data
<li> host wants to clean out the cache
<p>   maybe host keeps track of advice (priority) on cached data, so it    already knows what it wants to clean out</p>
<p>   or maybe host uses a 0-255 priority to tell the cachelet how badly    it wants to clean up the cache, and the cachelet then replies    with the amount of data it&#8217;s willing to clean out at that priority</p>
<p>   this priority could be the same as the caching priority.  That way,    when a cachelet advices the host &#8220;I want to cache X MB of data    at priority Y&#8221; and the host doesn&#8217;t have X MB free, it asks the    other cachelets how much space they are willing to free up for    data at priority Y</p>
<li> host is booting the cachelet off the cache
<p>   this can happen at any time, but it&#8217;d be nice if the cache advised    the cachelet and gave it a chance to clean up</p>
</ol>
<p>Trying to get this cachelet scheme (or any caching scheme) working with the existing web architecture will be difficult.  The web was never designed to cache, and its current design, particularly the way dynamic content is done, is very hostile to caching.  Since you can&#8217;t cache the programs that produce the dynamic content, you can&#8217;t cache the dynamic content at all, and this kills it.  The only way to cache the hordes of dynamic content running around the web is to convince web designers to write their dynamic content into cachelets.  We can do this without overloading our caches if we specify that HTML dynamic content will only be served to the local host, thus ensuring that the computational cost is bourne not by an ISP&#8217;s cache but by the client&#8217;s local cache.  Of course, there will have to be a significant installed base of Java caches before web designers will do this.</p>
<p>So I propose an end-around.  Let&#8217;s build a _video_ client (see my other blog posting about this) designed to use cachelet-based caches to cache the video feeds.  Nobody&#8217;s doing on-demand video on the Internet right now, so we&#8217;ve got the architecture to ourselves and can do it however we want.  If we can get it working well, then hopefully everybody comes running to it as the next killer app (ohh, ahh, video over the internet), we start get cachelet-based video caches deployed widely, the web guys want to get in on the act, and we say, &#8220;why yes, it can cache HTML, you just need to do this&#8230;&#8221;</p>
<p>	<H3>SCENARIOS</H3></p>
<ol>
<li> client downloading a new (video) item using lightweight multicast (LWM)
<ol>
<li> client downloads applet
<li> applet wants a data transfer
<li> applet does a DNS lookup to find existing caches for data,       and picks one based on nearest routing/closest match
<li> applet sends foreign cachelet a download request
<li> foreign cachelet multicasts a notification that download is       commencing &#8211; possibly using DNS to lookup the IP addresses       along the route to the applet and see if any caches have       registered an interest in those addresses
<li> local cachelet picks up notification, decides it wants to cache,       and subscribes to the multicast (along with the applet)
<li> transfer begins &#8211; both applet and local cachelet receiving
<li> local cachelet sends DNS update announcing that it can now serve the item
<li> applet gets DNS notification of &#8220;new&#8221; cache available and       decides to switch to local cachelet because it&#8217;s closer
<li> RTP handoff
<li> foreign cachelet sees that no non-cachelets are left subscribed
<li> foreign cachelet accelerates the download
</ol>
<p>At this point, since the cachelets are just talking to each other, they can increase the speed past real-time.  This lets them probe the connection to see how much bandwidth it can support.  If you start losing data, you can throttle back and retransmit(!) since you&#8217;re in the &#8220;future&#8221; as far as the user is concerned.  You can dynamically pick a video resolution to suit your link capacity in this way.  The applet/cachelet could be designed to anticipate future downloads, i.e, if the user is looking at menu X, then download the first second of each of the five different feeds linked to from that menu.  A completely different applet/cachelet pair could be used for realtime feeds that couldn&#8217;t do this.  A completely different applet/cachelet pair could be used for files and HTML that can&#8217;t tolerate data loss like RTP video.</p>
<p>Of course, LWM is pie in the sky, so we need a fall back solution&#8230;</p>
<li> client downloading a new (video) item using unicast
<ol>
<li> client downloads applet
<li> applet wants a data transfer
<li> applet does a DNS lookup to find existing caches for data,       and picks one based on nearest routing/closest match
<li> applet sends foreign cachelet a download request
<li> foreign cachelet unicasts notifications that download is       commencing &#8211; possibly using DNS to lookup the IP addresses       along the route to the applet and see if any caches have       registered an interest in those addresses       &#8230;and delays beginning the download for local cache notification&#8230;
<li> local cachelet picks up notification, decides it wants to cache,       and notifies the foreign cachelet
<li> foreign cachelet notifies applet that it&#8217;s local cache will have       the data
<li> transfer begins &#8211; only local cachelet receiving via unicast
<li> applet requests transfer from local cachelet
<li> local cachelet sends DNS update announcing that it can now serve the item       (maybe &#8211; or maybe policy is that local caches only serve the local host)
<li> foreign cachelet accelerates the download
</ol>
<p>Almost all of this can be implemented in Java using the applets and the cachelets.  Only step (a) is performed using something like HTTP, and step (j) requires a standard cache-cachelet API.  Otherwise, it&#8217;s all applet/cachelet and cachelet/cachelet communications (with standard stuff like DNS and RTP thrown in), and both applet and cachelet can be easily changed by the data author.</p>
<li> client downloading conventional (static) web page
<ol>
<li> a standard HTTP cachelet has been distributed with the cache
<li> client web browser is set to use local cachelet as HTTP cache
<li> web browser wants a web page, so it queries local cache
<li> cachelet doesn&#8217;t have the page in its local cache
<li> cachelet does a DNS lookup to find existing caches for this page,       and picks one based on nearest routing/closest match
<li> cachelet sends foreign cachelet a download request
<li> local cachelet downloads (how? probably TCP to foreign cachelet)       a copy of the page
<li> local cachelet notifies DNS via dynamic update that it&#8217;s got       a copy of the page
<li> local cachelet generates the web page for the browser
</ol>
<li> client downloading a dynamic HTML item using unicast
<ol>
<li> a standard HTTP cachelet has been distributed with the cache
<li> client web browser is set to use local cachelet as HTTP cache
<li> web browser wants a web page, so it queries local cache
<li> cachelet doesn&#8217;t have the page in its local cache
<li> cachelet does a DNS lookup to find existing caches for this page,       and picks one based on nearest routing/closest match
<li> cachelet sends foreign cachelet a download request
<li> foreign cachelet indicates in reply that a custom cachelet is used       to serve content for this URL
<li> local cachelet downloads (how? probably TCP to foreign cachelet)       a copy of the custom cachelet
<li> custom cachelet (local copy) notifies DNS via dynamic update       that it&#8217;s running on this new cache
<li> custom cachelet (local copy) generates the web page for the browser
</ol>
<p>We now have multiple HTML cachelets &#8216;slaved&#8217; off a parent cachelet responsible for handling all browser interactions.  Does Java provide us an easy way to hand off TCP connections from one cachelet to another?</p>
<p>Also, now we are faced with the problem of cache discovery for web pages over DNS.  Doing this right will make sure the rest of our scheme scales.  For starters, consider some like cnn.com.  Having a single DNS server listing all the caches with a copy of even just the home page would be crazy.  What we really want to know is just those copies &#8216;nearby&#8217; is a network sense.  Nearby to what?  Well, the DNS request includes the IP source address of the requester, so I&#8217;d say nearby the requesting IP address.</p>
<p>For starters, consider the case where the DNS server(s) have complete information about where everything is cached.  Except for the largest and most widely cached sites, this might be a perfectly workable model.  When the DNS server sees that it can&#8217;t return a complete list of resource records in a single response packet, it sorts them according to which are closest to the requesting IP address and returns only the top hundred or so of them.  The DNS server thus needs to have some routing information to sort the addresses (or just sort them by longest match IP address prefixes), but this seems fine.  Also you&#8217;ll want redundant DNS servers, so a distributed database seems to be called for, but I&#8217;ve discussed how to do that in another blog.</p>
<p>For the most popular sites, though, the shear number of entries on DNS servers in such a scheme would become prohibitive.  But this case can be optimized away behind the scene &#8211; a DNS server responsible for only a limited area of the network would only need to know of cache entries roughly within that area.  The behavior to the client should be _exactly_the_same_ &#8211; a set of resource records sorted to be &#8216;close&#8217; to the requesting IP address.  So we can leave the DNS protocol alone and study ways to build more optimized servers to handle this somewhat special case.</p>
<p>In fact, we&#8217;ve already got a way to do this.  The Java caches themselves can answer DNS queries.  All that&#8217;s needed is a new type of cachelet that manages cached DNS resource records and answers DNS queries.  A natural extension of the DNS &#8216;SRV&#8217; resource record would allow DNS servers to operate on arbitrary port numbers, not just 53. So a DNS cachelet could bind to a random UDP port and handle DNS queries.  We do need to make sure the cachelets can manage a distributed database as I described in the other blog.  The main change required here is to make sure the DNS resolvers (both clients and recursive servers) can handle DNS servers on non-standard ports.</p>
<p><H3>OTHER SCENARIOS</H3></p>
<li> ISP cache also wants a copy of something (multicast case)
<li> ISP cache also wants a copy of something (unicast case)
</ol>
<p>In summary, we started trying to fix up the web&#8217;s caching scheme and have ended up with a model that seems like it can cache not only HTML and video, but also dynamic content and DNS.  Probably it could do a lot more.  Probably if we deployed Java caches widely on the network they&#8217;d become quite popular for all kinds of things I haven&#8217;t thought of.  But if we put together a design that can handle web, video, dynamic content, and DNS without having to kludge any one of them, then it probably will be flexible enough to handle most future applications as well.</p>
<p><H3>OTHER</H3></p>
<p>Cachelets need some high-speed support services from the underlying host in order to perform some operations that can&#8217;t be done efficiently in Java.  These can be provided by a Java Native Interface (JNI) add-on, and primarily include both sending and receiving RTP streams to/from a disk cache.  The cachelet should be able to begin an RTP transfer (send or recv) and just let it proceed in background. For a receive stream, cachelet needs to see timestamps and sequence numbers on the received packets, possibily along with local timestamps for when packet were received, to determine if data is missing as well as calculating network jitter.  RTP transmits might only include a subset of data from a received stream (let&#8217;s say the applet wants to fast forward through a cached video stream), and might proceed at a different rate from the received stream, so the cachelet needs to be able to set (and change) the base frequency for interpreting RTP timestamps.  Also, we might be transmitting just ahead of a receive, so the cachelet should be able to set transmits on data that doesn&#8217;t exist yet, in the expectation that it will be there when needed.  To allow cache repair, applet might request some out-of-band transfers of individual packets via UDP and can insert them into the cache, so the cachelet probably needs to be able to interact with data in the cache, but this should be discouraged for performance reasons.  Also, to allow flexibility in caching, TCP streams should be supported.  If you can&#8217;t get RTP through a firewall, you might want to receive data on a TCP stream, then transmit it on an RTP stream, so this should be supported.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.freesoft.org/blogs/soapbox/?feed=rss2&#038;p=13</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
