Cloud computing has become one of the biggest buzzwords in IT. What is it? How does it work? Is it for real?
Cloud computing is an old idea in a new guise. Back in the 70s, users bought time on mainframe computers to run their software. There were no PCs. There was no Internet. You punched your FORTRAN program on cards, wrapped a rubber band around them, and walked them over to the computer center.
Then came the PC revolution, followed hard on its heels by the Internet. Everyone could now have a computer sitting on his desk (or her lap). Sneaker net gave way to Ethernet and then fiber optics. Mainframes became passé.
Well, mainframes are back! Turns out that centralized computing resources still maintain some obvious advantages, mostly related to economies of scale. Once an organization’s computing needs can no longer be met by the computers siting on people’s desks, a significant investment is required to install a room full of servers, especially if the computing needs are variable, in which case the servers must generally be scaled to meet maximum load and then sit idle the rest of the time.
Several enterprising companies have installed large data centers (read: mainframes) and have begun selling these computing resources to the public. Amazon, for example, is estimated to have 450,000 servers in their EC2 data centers. EC2 compute time can be purchased for as little as two cents an hour; a souped-up machine with 32 processors, 244 GB of RAM and 3.2 TB of disk space currently goes for $6.82 an hour. Network bandwidth is extra.
Yet wasn’t the whole point of the PC revolution to get away from centralized hardware? Can you really take a forty-year-old idea, call it by a new name, hype it in the blogosphere, and ride the wave as everyone runs back the other way?
In its day, at least as much hype surrounded the client-server model as now envelopes the cloud. Information technology advances rapidly enough that every new development is trumpeted as the next automobile. A more balanced perspective is to realize that there are merits to both centralized and distributed architectures, and after two decades of R&D effort devoted to client-server, we’re now starting to see some neat new tools available in the data center.
One of the nicest features available in the cloud is auto-scaling. Ten years ago, I ran freesoft.org by buying a machine and finding somewhere to plug it into an Ethernet cable. The machine had to be paid for up front, and if it started running slow, the solution was to retire it and buy another one. Now, running in Amazon’s EC2 cloud, I pay two cents an hour for my baseline service, but with the resources of a supercomputer waiting in reserve if it starts trending on social media.
A supercomputer! That’s what lurks behind all of this! A great number of these competing cloud architectures boil down to competing proposals to build a supercomputer operating system, coupled with an accounting system that provides a predictable price/performance ratio. Virtualization is one of the most popular models to achieve this.
Yet virtualization has been around for decades! IBM’s VM operating system was first released in 1972! Running a guest operating system in a virtual machine has been a standard paradigm in mainframe data centers for over 40 years! IBM’s CMS operating system has evolved to the point where it requires a virtual machine – it can’t run on bare hardware.
I’d like to see an open source supercomputer operating system, capable of running a data center with 100,000 servers and supporting full virtualization, data replication, and process migration. Threads are the way to write applications that run across multiple processors, so we should have a supercomputer operating system that can run on 100,000 processors. GNU’s Hurd might be a viable choice for such an operating system.