Archive for the ‘Computer Software’ Category

/etc: Please stop enabling

Thursday, March 22nd, 2018

One of the most painful legacies of UNIX’s long gestation is the mess of scripts, configuration files, and databases we affectionately call “/etc”. Binaries go in “/bin”, libraries belong in “/lib”, user directories expand out under “/home”, and if something doesn’t fit? Where does it go? “/etc”.

Of course, “/etc” isn’t really the overflow directory that its name would imply. “/config” would be a far better choice, more accurately reflecting the nature of its contents, but like so much of the data contained within it, its name suggests a lack of organization rather a coherent collection of configuration settings. While the rest of the filesystem has moved on to SQL, the user account database is still stored in colon-separated fields. Cisco routers can snapshot a running network configuration to be restored on reboot, but the best we can do is fiddle the interfaces file, then reboot to take the changes live. Or adjust the interface settings by hand and wait for the next reboot to see if the network comes up right. Our name service is configured in /etc/hosts, or /etc/resolv.conf, or /etc/network, or /etc/dnsmasq.d, or /etc/systemd, or wherever the next package maintainer decides to put it. Nor can you simply save a system’s configuration by making a copy of /etc, because there’s plenty of non-configuration information there, mostly system scripts.

What a mess.

The networking community has been experimenting for years with standardized data models for device configuration, initially SNMP MIBs, and more recently YANG. While SNMP was widely adopted for monitoring operational statistics, it was never really accepted for device configuration. YANG may change that. At the very least, Cisco seems to be moving decisively towards adopting YANG
for configuration tasks, with both NETCONF and RESTCONF supported as configuration protocols. It is yet to be seen if these network programmability tools will supplant command-line configuration to any meaningful degree.

Part of Cisco’s network programmability strategy involved their 2014 acquisition of the Swedish company Tail-f, a leader in YANG development. I recently spent a few hours experimenting with the free version of their confd product, and while it isn’t free software, it certainly left me thinking that we need something like it, and soon.

confd‘s basic design is that a single configuration daemon (confd) manages all system-wide configuration tasks. confd provides a command-line interface for interactive configuration, speaks NETCONF and RESTCONF to management software, saves and restores running configurations, and maintains a history of configuration changes to facilitate rollback. Other specialized daemons interact only with confd, via an API that lets them retrieve their configuration information and receive notifications of changes.

Consider, for example, dhcpd, and allow me to discard Tail-f’s currentl implementation of confd and speculate on how a free software confd might operate. dhcpd‘s distribution would include a file or files containing a YANG data model that specifies all of dhcpd‘s configuration state, plus operational data such as a list of current DHCP leases. dhcpd, when it starts, would connect to confd over D-Bus, using a standard API that allows any YANG-capable client to announce a YANG model to confd, obtain its current configuration, and subscribe to future changes. Administrator configuration of dhcpd will occur by connecting to confd‘s CLI and altering the data model there. The dhcpd model would be one part of a comprehensive data model that would represent the configuration of the entire system, and could be saved and restored in any format (config file, XML, JSON) understood by confd. confd, on its part, would need no DHCP-specific component, as it could parse YANG and interact with any client exporting a YANG data model.

The ultimate goal would be replacing most of /etc with a single configuration store that captures the entire configuration of the system. The running and startup configuration would both be stored in the same format, allowing administrators to adjust the running configuration, then save it with the confidence that it will be correctly restored after the next reboot.

The biggest challenge would be to convince the developer community to embrace such a standard. With so much momentum behind the current hodge podge structure of /etc, it seems like a tough sell. The easiest thing to do is to keep doing what we’ve always done. Got a new daemon? Whip up a config file in whatever format you please and stick it somewhere in /etc. Which, of course, add more momentum to a poor design.

A first step in this direction would be a free software version of confd that can parse YANG, presents a D-Bus API, has a configuration mode, and speaks NETCONF and/or RESTCONF. Then we can start modifying existing daemons to use the new standard. Some kind of backward compatibility to parse and write existing conf files would probably be mandatory, though the whole point of the design is to move away from that.

Whatever you do… Please Stop Enabling!

Cloud Computing

Friday, January 3rd, 2014

Cloud computing has become one of the biggest buzzwords in IT. What is it? How does it work? Is it for real?

Cloud computing is an old idea in a new guise. Back in the 70s, users bought time on mainframe computers to run their software. There were no PCs. There was no Internet. You punched your FORTRAN program on cards, wrapped a rubber band around them, and walked them over to the computer center.

Then came the PC revolution, followed hard on its heels by the Internet. Everyone could now have a computer sitting on his desk (or her lap). Sneaker net gave way to Ethernet and then fiber optics. Mainframes became passé.

Well, mainframes are back! Turns out that centralized computing resources still maintain some obvious advantages, mostly related to economies of scale. Once an organization’s computing needs can no longer be met by the computers siting on people’s desks, a significant investment is required to install a room full of servers, especially if the computing needs are variable, in which case the servers must generally be scaled to meet maximum load and then sit idle the rest of the time.

Several enterprising companies have installed large data centers (read: mainframes) and have begun selling these computing resources to the public. Amazon, for example, is estimated to have 450,000 servers in their EC2 data centers. EC2 compute time can be purchased for as little as two cents an hour; a souped-up machine with 32 processors, 244 GB of RAM and 3.2 TB of disk space currently goes for $6.82 an hour. Network bandwidth is extra.

Yet wasn’t the whole point of the PC revolution to get away from centralized hardware? Can you really take a forty-year-old idea, call it by a new name, hype it in the blogosphere, and ride the wave as everyone runs back the other way?

In its day, at least as much hype surrounded the client-server model as now envelopes the cloud. Information technology advances rapidly enough that every new development is trumpeted as the next automobile. A more balanced perspective is to realize that there are merits to both centralized and distributed architectures, and after two decades of R&D effort devoted to client-server, we’re now starting to see some neat new tools available in the data center.

One of the nicest features available in the cloud is auto-scaling. Ten years ago, I ran freesoft.org by buying a machine and finding somewhere to plug it into an Ethernet cable. The machine had to be paid for up front, and if it started running slow, the solution was to retire it and buy another one. Now, running in Amazon’s EC2 cloud, I pay two cents an hour for my baseline service, but with the resources of a supercomputer waiting in reserve if it starts trending on social media.

A supercomputer! That’s what lurks behind all of this! A great number of these competing cloud architectures boil down to competing proposals to build a supercomputer operating system, coupled with an accounting system that provides a predictable price/performance ratio. Virtualization is one of the most popular models to achieve this.

Yet virtualization has been around for decades! IBM’s VM operating system was first released in 1972! Running a guest operating system in a virtual machine has been a standard paradigm in mainframe data centers for over 40 years! IBM’s CMS operating system has evolved to the point where it requires a virtual machine – it can’t run on bare hardware.

I’d like to see an open source supercomputer operating system, capable of running a data center with 100,000 servers and supporting full virtualization, data replication, and process migration. Threads are the way to write applications that run across multiple processors, so we should have a supercomputer operating system that can run on 100,000 processors. GNU’s Hurd might be a viable choice for such an operating system.

iDictionary

Friday, January 12th, 2007

Apple’s recent announcement of the iPhone has inspired me to reconsider how IT can be used to support foreign language studies. According to Apple, the iPhone will have a microphone (it’s a phone, after all), run OS X, and have 4 to 8 GB of memory. That should be a sufficient platform to load a voice activated dictionary. After training a voice recognizer, you could speak a word into the device which it would then lookup in a dictionary and display the dictionary entry on the screen, providing language students with the detail of a full sized dictionary in something that could fit in their pocket.

Could pocket Spanish-English dictionaries be a thing of the past?

Dynamic DVD

Thursday, January 11th, 2007

As streaming video has become more commonly available, it is now plausible to discuss offering a video interface to a website. A user could connect to a site using a video client and “navigate” on-line video content using a DVD-style user interface – video menus, highlighted on-screen buttons, fast-forward, subtitles. Alternately, such an interface could augment conventional TV broadcasts, offering a nightly news program with video hyperlinks to hours of detailed coverage of events we currently see only in sound bites.

(more…)

Building chess tablebases over the Internet

Wednesday, August 23rd, 2006

I’ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames. Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board.

I starting thinking about this, but more along the lines of building the tables dynamically, using a pool of cooperating computers on the Internet. The idea would be to direct your search towards certain game positions that you wanted to analyze. This would work well in static analysis and also in relatively slow correspondence time controls (like Gary Kasparov vs The World, or the upcoming The World vs Arno Nickel).

(more…)

Software Checkpointing using Networking

Sunday, July 2nd, 2006

If we design our operating system so that processes perform all external operations using networking, in particular IP packets, then we can easily checkpoint processes.

(more…)

Demand TV… we could have it today

Thursday, June 15th, 2006

What would it take to offer a demand video service based on local cable TV? Let’s consider how to build a bank of computers that would record and save the last week of video on each of sixty cable TV channels, then deliver any part of the recorded video on a demand basis.

(more…)

Network distributed databases

Thursday, May 11th, 2006

Most database designs are tied to a single master server, which might get replicated out to various slaves, but what we’re really after is a database distributed and replicated across the network with no distinguished master server.

(more…)