Cloud Computing

January 3rd, 2014

Cloud computing has become one of the biggest buzzwords in IT. What is it? How does it work? Is it for real?

Cloud computing is an old idea in a new guise. Back in the 70s, users bought time on mainframe computers to run their software. There were no PCs. There was no Internet. You punched your FORTRAN program on cards, wrapped a rubber band around them, and walked them over to the computer center.

Then came the PC revolution, followed hard on its heels by the Internet. Everyone could now have a computer sitting on his desk (or her lap). Sneaker net gave way to Ethernet and then fiber optics. Mainframes became passé.

Well, mainframes are back! Turns out that centralized computing resources still maintain some obvious advantages, mostly related to economies of scale. Once an organization’s computing needs can no longer be met by the computers siting on people’s desks, a significant investment is required to install a room full of servers, especially if the computing needs are variable, in which case the servers must generally be scaled to meet maximum load and then sit idle the rest of the time.

Several enterprising companies have installed large data centers (read: mainframes) and have begun selling these computing resources to the public. Amazon, for example, is estimated to have 450,000 servers in their EC2 data centers. EC2 compute time can be purchased for as little as two cents an hour; a souped-up machine with 32 processors, 244 GB of RAM and 3.2 TB of disk space currently goes for $6.82 an hour. Network bandwidth is extra.

Yet wasn’t the whole point of the PC revolution to get away from centralized hardware? Can you really take a forty-year-old idea, call it by a new name, hype it in the blogosphere, and ride the wave as everyone runs back the other way?

In its day, at least as much hype surrounded the client-server model as now envelopes the cloud. Information technology advances rapidly enough that every new development is trumpeted as the next automobile. A more balanced perspective is to realize that there are merits to both centralized and distributed architectures, and after two decades of R&D effort devoted to client-server, we’re now starting to see some neat new tools available in the data center.

One of the nicest features available in the cloud is auto-scaling. Ten years ago, I ran freesoft.org by buying a machine and finding somewhere to plug it into an Ethernet cable. The machine had to be paid for up front, and if it started running slow, the solution was to retire it and buy another one. Now, running in Amazon’s EC2 cloud, I pay two cents an hour for my baseline service, but with the resources of a supercomputer waiting in reserve if it starts trending on social media.

A supercomputer! That’s what lurks behind all of this! A great number of these competing cloud architectures boil down to competing proposals to build a supercomputer operating system, coupled with an accounting system that provides a predictable price/performance ratio. Virtualization is one of the most popular models to achieve this.

Yet virtualization has been around for decades! IBM’s VM operating system was first released in 1972! Running a guest operating system in a virtual machine has been a standard paradigm in mainframe data centers for over 40 years! IBM’s CMS operating system has evolved to the point where it requires a virtual machine – it can’t run on bare hardware.

I’d like to see an open source supercomputer operating system, capable of running a data center with 100,000 servers and supporting full virtualization, data replication, and process migration. Threads are the way to write applications that run across multiple processors, so we should have a supercomputer operating system that can run on 100,000 processors. GNU’s Hurd might be a viable choice for such an operating system.

Faster Sailboats

April 20th, 2009

I may never have been on a sailboat in my life, but I am fascinated by the physics behind their operation. They must operate simultaneously in two mediums, as both an airfoil and a hydrofoil. Plus, they are probably one of the “greenest” vehicles ever conceived.

Which makes you wonder why we don’t use them more.

Not only are they dependent on the weather, but they are also considerably slower than an airplane. I’ve been thinking about how to make them faster, and my inspiration came from a book about airplanes – The Deltoid Pumpkin Seed by John McPhee. This book documents, in popular language, the experiences of a group of entrepreneurs and engineers to build a hybrid airplane/airship (the Aereon) that would have a small engine and use helium to improve its lift characteristics.

Why not do the same thing with a sailboat? Especially since most of the drag comes from the “wetted hull”, it would make sense to lift the hull out of the water as much as possible and leave only the keel submerged. Ship designers have been doing this for years with cleaver hull designs intended to lift themselves out of the water as they get up to speed, but the Aereon design suggests another way – helium.

What seems to make sense to me would be to build a trimaran and fill the outriggers with helium.

Cold Fusion

April 7th, 2009

A friend of mine recently bent my ear for an evening over cold fusion. It struck me as pseudo-science, but not wanted to be prejudiced, I spent some time with a web browser looking into it.

Though I still have a hard time believing some of these claims, I have to admit that this technology shows promise.

The Coulomb barrier that must be overcome to fuse two protons is about 5 MeV – not something you’d expect at room temperature, but well within the range of a standard particle accelerator. The problem is the minute cross section of the nucleus – protons with enough energy to fuse are far more likely to be scattered away from each other unless they are precisely aligned in a head-on collision.

That’s where the cold fusion claims start to get interesting. All of the cold fusion reports that I read involved palladium as a catalyst. Now palladium has the unusual property that it can absorb significant quantities of hydrogen. There doesn’t seem to be a consensus on exactly how this works, but one explanation is that the hydrogen nuclei can move fairly freely within the palladium crystal mesh. Now if I wanted to line something up at atomic dimensions, a crystal would be the obvious choice, and if I wanted to line something up while its moving, then I would want a crystal that allowed my particle mobility. So palladium seems like an obvious choice to line up moving protons to precisely collide them.

Read the rest of this entry »

Dynamic addressing

July 26th, 2008

A theorem of Thorup and Zwick (Proposition 5.1 in 2001’s Approximate Distance Oracles) states that a routing function on a network with n nodes and m edges uses, on average, at least min(m,n^2) bits of storage if the “route stretch” (the ratio between actual path length and optimal path length) is less than 3 (i.e, if two nodes are two hops apart, the actual route taken between them must be less than six hops). On the Internet topology, we can expect the n^2 term to dominate, so spreading these n^2 bits out among n nodes yields an average of n bits per node – i.e, each router’s routing table has to hold one bit for every device on the network.

Not a very encouraging result for those of us designing routing protocols.

Yet there is hope. The result is only an average. We can do better than the average if we allow our routing function to be skewed towards certain network topologies. And it occurs to me that the Internet doesn’t change fast enough that we can’t skew our routing function towards the current network topology.

How can we do this? With dynamic addressing: skewing our address summarization scheme to reflect the current network topology.

Read the rest of this entry »

Is Neuromancer closer than you think?

July 22nd, 2008

Could contemporary technology be used to build a Gibson-esque implant that could connect a human brain to a computer? Maybe more practical, what would it take to curve spinal cord injuries like Christopher Reeve’s using this kind of technology?

Read the rest of this entry »

DNS-based identification

June 10th, 2008

I’ve written before about how the failure of source routing created the need for NAT, but that post didn’t address the basic security problem with source routing that caused ISPs to disable it. It allows a man-in-the-middle attack where a machine can totally fabricate a packet that claims to come from a trusted source. There’s no way that the destination machine can distinguish between such a rogue packet and a genuine packet that the source actually requested be source routed through the fabricating machine. At the time, Internet security was heavily host-based (think rsh), so this loophole became perceived as a fatal security flaw that led to source routing being derogated and abandoned.

A quarter century later, I think we can offer a more balanced perspective. Host-based authentication in general is now viewed as suspect and has largely been abandoned in favor of cryptographic techniques, particularly public-key cryptosystems (think ssh) which didn’t exist when TCP/IP was first designed. We are better able to offer answers to key questions concerning the separation of identity, address, and route. In particular, we are far less willing (at least in principle) to confuse identity with address, if for no other reason than an improved toolset, and thus perhaps better able to judge source routing, not as a fundamental security loophole, but as a design feature that became exploitable only after we began using addresses to establish identity.

Can we “fix” source routing? Perhaps, if we will completely abandon any pretext of address-based authentication. What, then, should replace it? I suggest that we already have our address-less identifiers, and they are DNS names. Furthermore, we already have a basic scheme for attaching cryptographic keys to DNS names (DNSSEC), so can we put all this together and form a largely automated DNS-based authentication system?

Read the rest of this entry »

iDictionary

January 12th, 2007

Apple’s recent announcement of the iPhone has inspired me to reconsider how IT can be used to support foreign language studies. According to Apple, the iPhone will have a microphone (it’s a phone, after all), run OS X, and have 4 to 8 GB of memory. That should be a sufficient platform to load a voice activated dictionary. After training a voice recognizer, you could speak a word into the device which it would then lookup in a dictionary and display the dictionary entry on the screen, providing language students with the detail of a full sized dictionary in something that could fit in their pocket.

Could pocket Spanish-English dictionaries be a thing of the past?

Dynamic DVD

January 11th, 2007

As streaming video has become more commonly available, it is now plausible to discuss offering a video interface to a website. A user could connect to a site using a video client and “navigate” on-line video content using a DVD-style user interface – video menus, highlighted on-screen buttons, fast-forward, subtitles. Alternately, such an interface could augment conventional TV broadcasts, offering a nightly news program with video hyperlinks to hours of detailed coverage of events we currently see only in sound bites.

Read the rest of this entry »

Building chess tablebases over the Internet

August 23rd, 2006

I’ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames. Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board.

I starting thinking about this, but more along the lines of building the tables dynamically, using a pool of cooperating computers on the Internet. The idea would be to direct your search towards certain game positions that you wanted to analyze. This would work well in static analysis and also in relatively slow correspondence time controls (like Gary Kasparov vs The World, or the upcoming The World vs Arno Nickel).

Read the rest of this entry »

Java Caches

August 23rd, 2006

I’ve been thinking for several years about the flaws in the Internet’s (nearly non-existant) caching model, and have reached several conclusions. First, caching policy is very difficult, basically impossible, to specify in some arbitrary protocol. This is one of the biggest problems we’ve got with caching – the cache manager has a lot of settings he can adjust, but the data provider has almost none – basically cache or don’t cache (oh yeah, he can specify a timeout, too). So, I’m led to conclude that data providers need a very flexible way to inform caches of their data’s caching policy, like a remote executable format, i.e. Java. My second conclusion is that what limited caching we’ve got is destroyed when people want to provide dynamic content. The only way I can see to cache dynamic content is to cache not the data, but the program used to create the data, and expect the client to run the program in order to display the data. And we don’t want to do that on the caches (if we can help it) for performance reasons. Again, a remote executable format, this time on the client, i.e. Java. My third major conclusion is that caching is a multicast operation and requires multicast support to be done, but that’s for a diferent paper. Thus, I’m proposing an integrated Java-based caching architecture using what I call “cachelets” on the caches to provide a flexible and usable caching architecture.

Read the rest of this entry »