Building chess tablebases over the Internet

I’ve read some stuff on the Internet about using pre-computed tablebases to solve complex endgames. Apparently you can load a five- or six- piece tablebase into a program like Fritz and it will then know how to mechanically solve any endgame with five (or six) pieces on the board.

I starting thinking about this, but more along the lines of building the tables dynamically, using a pool of cooperating computers on the Internet. The idea would be to direct your search towards certain game positions that you wanted to analyze. This would work well in static analysis and also in relatively slow correspondence time controls (like Gary Kasparov vs The World, or the upcoming The World vs Arno Nickel).

A big key here would be to segment your search space so that individual computers can calculate part of it independently and then return the results to a server. I think we can do this with chess because pawns can’t move backwards and captures can’t be reversed, either. So we can ask each computer to analyze up to a capture or pawn movement. We treat the pawn positions as static, and all the other pieces (the ‘mobile’ pieces) move around them. As soon as a pawn moves or a capture occurs, we’re into a different table. So long as, say, all but four of the pieces are pawns, these files are of reasonable size (a few MB) and can be computed in a reasonable time.

Pruning

We prune if we don’t want to fully analyze out the tree past the table we’re now building. Of course, this will affect the accuracy of the table; the table is a result of BOTH the position it was set up for AND the pruning decisions (and any pruning decisions made on the future tables used to calculate this one).

We specify pruning in a simple way – by omitting future tables for moves we don’t want to consider. This can be dangerous, so we require this feature to be specifically enabled for each move by a command-line switch. Actually, we use two switches, one to calculate a table for OUR SIDE to move, and another if it is the OTHER SIDE to move.

So, --prune-our-move e3e4 prunes a pawn move (assuming this is a table with a static pawn on e3) by simply ignoring e3e4 as a possible move.

Pruning an opponent’s move is more complex because we step a half-move into the future and consider our own next move. This costs us little, since we can control our own move and therefore don’t have to consider all possibilities, and improves a lot. If future tables exist for any of our responses, they are used. If no such future tables exist, then the move is regarded as a lost game.

So, --prune-his-move e7e8 prunes a pawn promotion (assuming a static pawn on e7) by considering all possible positions resulting after the pawn promotion (to either Q or N) AND the answering move. The resulting game is regarded as a win for white unless both Q and N promotions have an answer that leads to another table with a win or draw for black.

For example, let’s say we’re looking at a Q-and-P vs. Q-and-P endgame. There are four mobile pieces (2 Ks and 2 Qs), so we can handle this. But if one of the pawns queens, then we’ve got a game with five mobile pieces, and that’s too complex. But we don’t want to completely discard all possible enemy promotions, if we can immediately capture the new queen (or the old one). So we specify something like --prune-his-move e7e8 and pass in a tablebase for a Q-and-P vs. Q endgame.

We also check for immediate checkmates or stalemates.

Pruning doesn’t affect the size of the resulting tablebase. We discard the extra information. If the pruned move is actually made in the game, then you have to calculate all possible next moves and check your tablebases for them. This seems reasonable.

In short, intelligent pruning is essential to making this work. Let’s say we get into a rook and 3 pawn vs. bishop and 4 pawn endgame (or something like that). We can’t just solve the whole thing. If all the pawns queened, we’d get 7 queens on the board and that would be impossible. So we have to prune out these highly unlikely combinations. We can also prune our own moves aggressively, because we don’t have to make them! Pruning makes this whole idea feasible.

Program Design

Let’s keep the program simple. It’s designed to be run on a whole bunch of machines over the Internet, each one calculating a different table, that all then get linked together. Each table goes into a different file. The server assigns a program the task of computing a certain file. The files needed for this task are downloaded. Then the program runs locally, just calculating the requested table/file. When done, the file is uploaded back to the server.

File naming syntax: wPIECES-bPIECES.tbl

PIECES is a list of ‘mobile’ pieces (K, Q, R, B, N) followed by pawn positions in algebraic notation.

wKQ-bKQ.tbl is a (K+Q) vs. (K+Q) endgame

wKQe6-bKQ.tbl is a (K+Q+pawn-on-e6) vs. (K+Q) endgame

Certain symmetry relations exist. We don't need both wKQ-bK.tbl and wK-bKQ.tbl; just one will suffice. wKQe6-bKQ.tbl is basically the same as wKQd6-bKQ.tbl or wKQ-bKQe3.tbl

Need an algorithm to determine symmetry; maybe white should always have numerically superior side and pawns should always skew towards lowest letters and highest numbers.

We need some tables to compute others. For example, to compute wKQd6-bKQ.tbl, we need wKQd7-bKQ.tbl (possible pawn advance), wKd6-bKQ.tbl (possible x of white queen by black king or queen), wKQd6-bK.tbl (possible x of black queen by white king or queen), wKQd7-bK.tbl (possible x of black queen by d6xe7) and wKQc7-bK.tbl (possible x of black queen by d6xc7).

Some of these can be pruned. If we're looking for a forced win by white, we can omit the rarer tables and treat any move into them as a forced win for black. If we can still force a win for white, we know that it can be done without moving into any missing table. This might result in some obviously won positions being reported as undecided.

File format is designed to be mmap'ed into the program's memory space; it can be compressed with gzip for transport.

File looks like an array [64][64]...[64][2]. There are as many [64]'s as mobile pieces, and correspond to them in the order listed in the filename, so the first [64] is always white's king. The [2] determines whose move it is - white to play (0) or black to play (1).

Each entry in the array is a four-byte structure:

struct {
	unsigned char	movecnt;
	unsigned char	white_wins_moves;
	unsigned char	black_wins_moves;
	unsigned char	draw_moves;
}

We init by running through the array, flagging illegal positions (all one bits, maybe), flagging positions that are checkmates for one side or the other as won but 'unpropagated' (how?), flagging stalemates as drawn but 'unpropagated' (how?) and otherwise computing the number of moves away from this position, storing that number in movecnt and setting the other fields to zero.

Now we run through the table, looking for 'unpropagated' positions and propagating them. This means backtracking one half move. If the position is black to play and is won for white (or vice versa), then we back up to every position that could have gotten here by one move (white's) and flag that position won for white and 'unpropagated'. If the position is white to play and is won for white (or vice versa), then we back up to every position that could get here by one move (black's), decrement movecnt by one and increment its white_wins_moves by one. If its movecnt is now zero, we evaluate it. It won't have any black_wins_moves (because it would have been flagged immediately won for black). So its moves are all either draws or wins for white. If there are no draws (only forced wins for white), we flag it as won for white and 'unpropagated'. Otherwise, we flag it as drawn and 'unpropagated'. If we're back propagating a draw, we'll be incrementing draw_moves.

And I skipped a step. The first step after initialization is to run through all the tables this one depends on, and back propagate from those positions (all marked either won for one side or drawn) to the current table.

Now we just keep running through the table, finding entries marked 'unpropagated', propagating them and marking them 'propagated'.

Finally, everything left should be the positions that will be drawn by repetition with best play from both sides, and should be flagged as such.

It would seem that this algorithm will avoid draws by repetition because it will always make forward progress if possible, except in very long winded positions in something like a three knights endgame.

Finally, we'd like the positions to be flagged not only won, lost, or drawn, but also with the number of moves required for a win. This seems to require a modification to the basic algorithm, since we can't just flag a move as won, but maybe not. If we process all the won positions first, then everything flagged will be a mate in one. Then we process all the mates in one; everything flagged will be a mate in two, etc.

So, we really don't need the four-byte structure. We don't need a 'mover_wins_moves' field, because if we find even a single move where the mover wins, we're done with this position. We need 'movecnt', 'opponent_wins_moves', and 'draw_moves'. We could lose one of these if we re-calculate the number of moves available in this position. I think the quickest way to do this is to scan the table for all possibles movements of the mobile pieces and see if the positions are marked invalid or not. Looks like a space-vs-speed tradeoff.

So, we could get down to a two-byte structure.

struct {
	unsigned char	movecnt;
	unsigned char	opponent_wins_moves;
}

movecnt's high-order bit could be reserved for the 'propagated' flag, which leaves room for 127 possible moves. This seems OK. 3Q+K would have at most 92 moves available to them.

Once 'propagated' is set, the meaning of the structure changes. Now we have two more bits indicating:

	00 - draw
	01 - win for white
	10 - win for black
	11 - invalid move

The remaining 13 bits can store a 'mate in' count, up to 8192! We should add another flag bit to indicate if the 'mate in' count is total moves or just to get to the next table.

Probably best to keep our program flexible by allowing for either four-byte or two-byte format to be used. Maybe different file extensions, or leave room for a header at the beginning of the file to indicate which format we use. Let's use a 256 byte header, put a version/format field in the first four bytes (ascii) and leave the remaining 252 bytes (for now) to double-check the file name, indicate that processing has been completed and we don't have corrupted data, text string to describe how the file was built, indicate which file dependencies were satisfied by actual files (and maybe their MD5 checksums) and which were pruned as forced wins or draws, and an MD5 of the rest of the file.

Have to see how gzip does, or might want a more compressed format for transport.

Could dump the move count and get a 2-bit-per-position format, but that's dangerous if we don't store the moves themselves, because then we would be moving in circles. This would only be good for transport to a 'slave' program computing another table, but we still wouldn't have exact move counts (couldn't tell if we should head for one move or another first in the next table).

.tbl for 'full' (maybe 2-byte) format
.tbc for 'compact' (2-bit) format - 1/8 size of full format

Compact format also has the disadvantage of not being able to convince people with a full move sequence, but this might not be a factor for any but the strangest cases (because we can usually figure out the move sequence from the compact format)

What tablebases to build?

all possible three- and four-piece tables are fundamental for more complex ones

an uncompressed two-byte four-piece table will be 64MB, so a complete set of all 30 of four-piece-ers will take up about 2GB (uncompressed)

wKQ-bK
wKR-bK
wKB-bK (building this and next one both checks program and avoids draws)
wKN-bK
wKP-bK (put this in its own table rather than 64 8K or 16K files?)
wKQ-bKQ
wKQ-bKR
wKQ-bKB
wKQ-bKN
wKQ-bKP (ditto for all these four-piece tables with pawns)
wKR-bKR
wKR-bKB
wKR-bKN
wKR-bKP
wKB-bKB (should handle both same color and different color bishops OK)
wKB-bKN
wKB-bKP
wKN-bKN
wKN-bKP
wKP-bKP
wKQQ-bK (some of these seem silly)
wKQR-bK
wKQB-bK
wKQN-bK
wKQP-bK
wKRR-bK
wKRB-bK
wKRN-bK
wKRP-bK
wKBB-bK (but some are important, for sure!)
wKBN-bK
wKBP-bK
wKNN-bK
wKNP-bK
wKPP-bK

One Response to “Building chess tablebases over the Internet”

  1. cosine Says:

    I ended up putting a lot of work into this idea over the last year and a half, and the result is Hoffman, an open-source tablebase generator.

Leave a Reply

You must be logged in to post a comment.