Software Checkpointing using Networking

If we design our operating system so that processes perform all external operations using networking, in particular IP packets, then we can easily checkpoint processes.

All file system accesses, for example, must go through some kind of network file system; we purge the kernel API of all such things like open() and read(). In addition to task switching and memory management, the only primitives the operating system provides are packet-oriented network operations, so this is definitely similar to a microkernel approach. Even TCP must be performed in user space (by the system library), since the kernel doesn’t provide reliable networking services – only the classic unreliable packet delivery of IP.

What’s the point? Well, now we can implement a simple means of software checkpointing. We queue all outgoing packets from a process, and do not deliver them until the O/S has had the opportunity to stop the process, checkpoint it by saving all of its modified memory areas to disk, and then deliver the packets.

To recover a checkpointed process after a crash, all the system has to do is to recognize the last complete checkpoint saved to disk, recover the process state at that time, which should consist of little more than its memory image (remember, there are no open files because all file I/O is done over the network), and start the process running from that point.

The worst case scenario is that the system died right after the process checkpointed but before some or all of the queued packets were delivered. This appears to the process (and all other processes) as nothing worse than packet loss! A timeout occurs and the packets are retransmitted.

One might wonder what happens if packets are received and processed by the program, which then crashes before a checkpoint occurs. That’s why we queue outgoing packets until after a checkpoint. The process can’t “act” on the packets it’s received, not in the sense of being able to influence any outside state (which would require packets to be transmitted). Again, after recovery, this would appear as nothing worse than packet loss – this time on the inbound side.

In addition to being able to checkpoint programs across a system crash, this would also provide a reliable means of checkpointing a program so that it could be started at a given point without needing to go through a long startup procedure.

It would also provide a means of migrating programs between computers (at least those with identical CPU architectures). Since all the process’s interaction is done through networking, all that is needed is a generic mobile networking capability to allow the program to deal with being stopped, moved to a different computer (by copying its memory image), and restarted there.

Of course, there will have to be a little more to the process state than just its memory image. I’ve already tacitly assumed that a timeout facility is available. Probably some other facilities are needed to, but the big problem is recovering open file and socket state, and that is neatly dealt with by this scheme.

Shared memory presents both a challenge and an opportunity. For performance reasons, we certainly don’t want to ban it! On the one hand, any set of processes sharing memory amongst themselves would have to either be checkpointed as a block, or a careful set of controls would have to be imposed to the effect that after a process wrote to a region of shared memory, that region could not be accessed by another process until the first one had been checkpointed. On the other hand, the possibility of migrating processes between computers means that shared memory would no longer be a hit-or-miss proposition. Cooperating OSes could migrate processes together on the same computer when they wanted to use shared memory.

Some processes, especially those which access hardware directly, would need extra support. The file system, for example, would ultimately have to be implemented by actually writing to a disk. The process(es) handling this obviously could not be checkpointed in the usual manner. Indeed, any process directly managing hardware would probably have to be responsible for its own checkpointing.

The disk manager is probably the toughest case, though. The display manager, especially in a windowing system like X where the server can request a client to redraw its screen at any time, doesn’t seem to require that much additional state. Specialty hardware with little used drivers would probably be the most difficult to handle from a maintenance perspective.

Leave a Reply Cancel reply